Securing Big Data Starts With Zero Trust

Big data brings big challenges around information security. You can now collect a massive volume and variety of information, and mine unprecedented insights to drive innovation and improve business outcomes. But to take full advantage of your data, you need to be able to share it with a much broader spectrum of stakeholders while meeting privacy and governance responsibilities. And you’re likely to find that security solutions that worked well in traditional environments get very complicated, fast.

Many organizations end up with brittle solutions that can’t keep pace with the huge amounts data they’re collecting or the proliferating ways people want to use it. Or worse, they inadvertently allow improper access.

It’s possible to achieve strong data security and privacy at big data scale, but it requires rethinking how you handle access control. Fortunately, there is a highly successful model from the world of networking: Zero Trust.

By extending Zero Trust principles to the data layer, you can achieve on-demand data access at scale, without compromising security or privacy. You’ll improve the overall security of your system, and make it much easier for stakeholders to capitalize on data.

The Big Data Security Challenge

To understand why securing big data is so hard, consider a healthcare scenario. Pooling all healthcare data in one place—clinical records, test results, genetic sequencing, imaging, etc.—a major health system could spur huge advances in disease research, business operations, and more. But they also have to contend with a much more complex privacy regime for protected health information (PHI).

Within a single system, they now must accommodate everything from physicians allowed full access to PHI, business analysts who should only see de-identified information, government statisticians who can view aggregate data for public policy purposes, and many stakeholders in between. Achieving that kind of flexible access control is extremely difficult using current approaches.

Standard database protections weren’t designed for it—there’s no way in SQL Server, for example, to show different views of the same data to users with different levels of access. So database administratorstypically create a copy of the record de-identified for a given use case, and grant access to that separate data mart, ending up with proliferating copies of data and a nightmare making sure they’re all up-to-date and consistent. Healthcare is one example; the same problem exists in many industries.

Enter Zero Trust Data

PHEMI is advocating a novel approach to this problem, modeled after Zero Trust Networking. In a Zero Trust Network, no network segment is “trusted;” every connection attempt is interrogated for proper authorization, every time.Zero Trust Dataextends these principles to the data itself.

Every data asset is encoded with metadata as it’s collected, describing its sensitivity within the organization’s privacy and governance framework. The metadata describes the data alone; if something is tagged PHI, for example, the system always treats it that way, regardless of who wants to use it.

Metadata is paired with a flexible,attribute-based access control mechanism, which goes beyond conventional role-based access control. It interrogates user attributes—who, what, where, when, how— and grants access based on policy.

Instead of requiring organizations to continually create new copies of a record, the system repurposes data on the fly to serve only specific information a user is authorized to see. Effectively, it retains one immutable copy of the data, presenting it through different lenses depending on the user’s attributes.Inthe healthcare example above, a clinician or patient would be able to see the full, raw data. A population analyst would see only de-identified records.

All of this happens at the data layer, in real time, at scale. So access can be extended to users with different levels of authorization, without compromising privacy or governance responsibilities.

Security at Scale

There are multiple advantages to this approach.

First, the system can enforce complex authorization logic without creating a performance bottleneck. This isn’t a bolt-on solution all requests have to be filtered through; it’s ingrained at the core level of the data system. Using proven big data technologies such as Hadoop, it produces minimal overhead, so can enforce granular privacy and governance policy even as it scales to petabytes of data.

Because security policy is built into the data layer itself, it also increases the overall security of the system. Without appropriate authorization, no one can see anything by default, any query returns nulls.

Zero Trust Data uses standard industry best practices for things like authentication and secure communications. It just extends them to the data,so you can tightly control what users see without imposing added complexity on the IT system.

Capitalize on Your Data

In virtually every industry, organizations face tremendous pressure to collect more data from more places. But to make the most of that data, you have to be able to share it. Even if all in one place, data is of little value if only a few people are authorized to interact with it. By promoting smarter access with a Zero Trust Data approach, you can keep tighter control over sensitive information, while helping your organization unlock its full value.[su_box title=”Roy Wilds, Director of Product Management, PHEMI Systems” style=”noise” box_color=”#336588″]Roy Wilds is the Director of Product Management at PHEMI Systems, a Vancouver-based startup that delivers privacy, security, governance, and enterprise-grade management for big data. Its flagship product, PHEMI Central, applies the Zero Trust Data approach, which embeds and enforces consent, data sharing agreements and privacy policies at the data level, removing a critical roadblock standing in the way of enterprises that aim to become data-driven. Roy has significant expertise in big data technologies, data mining, machine learning theory, Python, R, SQL, and the Hadoop ecosphere. He holds an Honors Bachelor of Science from Simon Fraser University in the Department of Physics, and a Master of Science and a Doctor of Philosophy in the Department of Mathematics and Statistics from McGill University.[/su_box]