Introduction

Let's start our discussion about data security with a quick look at money. Because we value money, we are motivated to actively protect it. And the way we protect and keep track of money is based on custodianship. We protect it by keeping it somewhere safe, in a wallet, in a bank account, or in a jar on the kitchen counter. By keeping money in a secure container, we can protect it from unwanted access. The entire global financial system is based on secure custodianship of money and transferring that custodianship from person to person, organization to organization, while preserving transactional integrity and lineage of custodianship.

Let’s look at a specific example. Someone has loose change in their pants pocket. If that person unknowingly drops a coin on the sidewalk, they no longer have custody of that money. However, the coin doesn’t care. It just sits there until someone notices and picks it up. It could be picked up by the person that dropped it, or by someone else. In either case, picking up the coin establishes a new custody relationship. It makes no difference to the coin who dropped it, who picks it up, who uses it, or what it is used for. The coin is still the same coin, has the same value and can be spent by its current custodian.

Data has a lot of the same characteristics as money. In fact, data is often referred to as digital currency. Most importantly, the way we protect and keep track of data is also based on who has custody of the data. We protect it by keeping it somewhere safe, on a PC, on a thumb drive, or in an AWS S3 bucket. By storing data securely, we can protect it from unwanted access or manipulation.

Like money, data cannot protect itself because it is an inanimate entity. It has no inherent loyalty, and is equally at home on your computer or mine. One key difference between data and money is that data can be easily copied or moved. When data is copied, the original data remains unchanged, and the copy is identical to the original. When data is moved, it disappears from the original location and appears at the destination without leaving any trace of having been moved. So, unless specific logging is enabled, there is usually no way to tell if specific data has been copied or moved once the operation is completed. The ease with which data can be copied or moved magnifies the risk of that data being compromised. As with money, once you have custody of some data, you are free to do anything with it.

Modern currency and financial systems are effective at protecting money. These systems have matured over many years and provide secure storage and integrity in money exchange. Many fundamental concepts of financial systems can be applied to develop a robust data security system.

Protecting valuable data will require systems that proactively manage data across it’s complete lifecycle, as it is copied or moved from custodian to custodian. To enable ongoing protection, receiving custodians need to know what data protection is required, and custodians must agree to protect the data the same way that the original data owner would want to have it protected.

A system that provides secure data storage and integrity in data exchange is possible. It will require significant effort to develop and broad industry support to effectively implement.

Current Data Protection Practices and Associated Issues

There are currently two common approaches for protecting data; (1) protected network enclaves and (2) role-based access control. A third approach, less common, is attribute-based access control. This is an approach that is well suited to managing individual resources, however, it is not widely implemented. Let’s discuss the two most common approaches, protected network enclaves and role-based access control.

The protected network enclaves concept typically starts with an enterprise identity authentication which, if successful, grants a user access to an enterprise network environment. From there, a user will be granted access to individual subnetworks or systems, as needed. Within the main network enclave, there can be a series of nested enclaves, with each nested enclave restricted to a more exclusive membership. The nesting can go several layers deep, with a significant proliferation of enclaves occurring at each nesting level. Data resources are protected by hosting them in an enclave with the appropriate membership and exclusions.

The role-based access control (RBAC) concept is typically implemented within an application or system. There, RBAC is used to control which roles have access to individual application functions. Because application functions are relatively unchanging over time, it is relatively easy to assign and maintain role based restrictions on them. However, RBAC controls that are applied to functions are one level removed from being able to directly control access to data.

For example, a person in an HR role is accessing sensitive data using an HR module in the corporate HR application. Typically, RBAC is used to restrict access to the HR module. There are no access restrictions applied to the HR data itself.

So, why aren’t RBAC controls applied directly to the data? Unlike application functions, data resources are constantly changing. This makes it difficult to assign and maintain valid role based restrictions on each data record over time as data records change.

The problem with these and other related approaches is that they do not directly protect data based on a data access policy appropriate for the specific data element(s). Rather, they reduce access to the data by restricting who has access to the enclave that the data resides in. Data is indirectly protected by limiting the number of users who have access to that enclave. The appropriateness of the protection is based on the quality of the match between the enclave membership or assigned roles, and the individuals who should access the resource. While this provides some measure of protection, it by no means provides the appropriate access control for each specific data resource. The protection realized often depends on the users who have access to the enclave being well-behaved and only accessing data for which they have a justified need.

Using these approaches, data protection is improved by having more and smaller enclaves. However, data becomes more difficult to find and access as it becomes more compartmentalized. Another problem is knowing of the existence of data in protected enclaves. Once data is located, then figuring out who to ask for permission to gain access becomes the next challenge. And delays are common when requesting accounts be created and authorizations granted for accessing systems and enclaves. Once data is located, and access is granted, combining data that resides in different protected enclaves is a further challenge.

With the advent of a central data repository such as enterprise data warehouses, data lakes, and other variants on this theme, data is imported without including necessary information to understand how that data should be protected. Most usually, this necessary data protection information does not exist. As a result, specific data protection is often lax or non-existent. So once data is readily available in these large independent repositories, users download and re-host this data in other independent repositories. Before long, this data proliferation had everyone swimming in multiple copies of data of unknown origin. Assured data security is just not possible in this situation.

In order to achieve effective data security in the modern environment, systems must be able to efficiently combine and share information with employees, customers and suppliers, while also preventing unauthorized users from gaining access to that same information. Achieving an appropriate level of security for each and every data resource in today’s IT environment, with todays applications and practices, is practically impossible. So, what must be done to change this situation?

The Prospect for Federated Data Security

In today’s IT environment, data is no longer constrained to a few application platforms as it has been in the past. Sharing data usually means sending it outside of the protective application enclave that it was created in. As it moves across the enterprise, it changes format, is combined with other data, is modified after leaving the originating system, and may remain active in the enterprise for a long time. Data owners can no longer rely on application platform security to protect their data. What is needed is a data protection scheme that transcends systems, and even transcends the enterprise. What is needed is a federated data security capability.

Systems must be able to efficiently and securely share data with employees, customers and suppliers, while preventing unauthorized users from gaining access to that data. Data must be shared in a way that ensures it is protected regardless of where it resides, who has custody of it, what format it is in or how may times it has been copied. Data security depends on organizations and users understanding clear lines of responsibility and accountability for protecting individual data elements. A common understanding of the requirements for protecting multiple copies of the same data is essential. Access control mechanisms (AKA AC/PEP or the Access Control / Policy Enforcement Point) must be aware of the data security attributes and policies necessary to make consistent access decisions. And because data is copied and moved freely across multiple systems, access control decisions for multiple copies of the same data at different locations must be made correctly and consistently.

Data security regimes, both internal and external to the enterprise, must be able to federate to securely share data. A federation between two parties is actually a pair of unilateral sharing relationships. Each sharing relationship comprises two parties; one party as the publishing party, and the second party as the subscribing party. Before sharing can begin, the publishing party must assert its requirements for protection of the data it will be sharing. The subscribing party must accept the requirements, and demonstrate that the shared data will be adequately protected.

When a publishing party shares a specific data element, it must be accompanied by the security attributes and policies that characterize the specific protection needs of the data. When the subscribing party receives a specific data element, it must maintain the relationship between the data and the associated security attributes and policies, and provide adequate access control to protect the data as agreed.

This federation utilizes users, roles and other identity characteristics necessary to correctly interpret access policies. Standardized mechanisms for communicating user identities, roles identity characteristics, data security attributes and protection policies across systems, organizations and different data types are needed to make data security federation feasible at scale.

Figure 1: Secure Data Federation Key Process Elements

Automated data security capabilities are essential for minimizing the effort required by users to define security for data elements. If data security mechanisms impose too much effort onto users, necessary security actions will be ignored, and data security will not be realized. It must be easy for data creators and stewards to define data security policy and metadata when the data is created.

What follows are the foundational pillars required to enable the kind of data security practices that will ensure comprehensive, efficient, fine-grained and distributed protection of data. This protection must be effective no matter who has custody of data, what kind of data it is, where the data resides, when that data is accessed, or how many times it is copied over its lifetime. All these foundational pillars must be present to realize a comprehensive data security regime that can securely share data with others.

Foundational Pillars

The robust protection of data must be based on secure custodianship of data and clearly transferring that data and the associated custodianship from person to person, organization to organization, while preserving transactional integrity and lineage of custodianship. By keeping data in secure enclaves, we can protect it from unwanted access. By transferring that data with well understood requirements for its protection, those receiving our shared data will understand their obligations to protect the shared data. When data recipients clearly understand the data protection requirements, they can ensure they take the appropriate actions to protect the shared data.

The following foundational pillars will provide the necessary elements for implementing a secure system for the federated sharing and protecting data.

(1) Data Security Federation Agreements

A Data Security Federation Agreement (DSFA) is a legal document that specifies the terms and conditions for unilaterally sharing data between responsible, relying parties, one of which is the publisher, and the other is the subscriber. This agreement is a legally binding document that specifies the subscriber’s obligations to protect the publisher’s data. The subscribing party must be able to be held liable for damages if they breach the terms of a DSFA.

A DSFA must address:

The specific purposes for which the data will be shared;
The type of data that will be shared;
The duration of the sharing;
The scope of the sharing;
The security measures that will be implemented to protect the shared data;
The rights of the federating publisher with respect to their data;
The obligations of the federating subscriber with respect to the shared data.

DSFAs must be in effect prior to any sharing of data.

(2) Data Security Federation Governance

Data Security Federation Governance is the policy and operational authority within an organization that is responsible for entering into and administrating Data Federation relationships with other parties.

When participating as a data federation Publisher, the publisher’s Governance authority is the Subscriber’s main point-of-contact within the Publisher’s organization for data security federation interactions. Overall, it is responsible for authoring and tailoring the terms of a DSFA with subscribing parties. In addition, it is responsible for ensuring that the Subscriber adheres to the terms of the DSFA by conducting audits and formal tests of the Subscriber's data protection measures. It also works with the subscribing organization to oversee corrections, mitigations or data repatriation associated with a violation of the DSFA.

When participating as a data federation Subscriber, the Subscriber’s Governance Authority is the Publisher’s main point-of-contact within the Subscriber’s organization for Data Security Federation interactions. Overall, the Subscriber’s Governance Authority is responsible for accepting the terms of a DSFA. Specifically, it is responsible for ensuring that the subscribing organization's systems provide the appropriate data security measures to protect the shared data. Then it is responsible for ensuring that the Subscriber adheres to the terms of the DSFA on an ongoing basis. It is responsible for recording any discrepancies in the subscribing party's adherence to the DSFA. It also verifies the implementation of any corrections or mitigations associated with a violation of the DSFA.

(3) Data Security Role Sharing

Data Security Role Sharing is where a publishing party provides role definitions to the subscribing party for implementation within the Subscriber's systems. These role definitions are then populated by the subscriber with the appropriate user identities that conform to the defined user-selection criteria. This will allow the Subscriber's systems that are hosting the Publisher's data to enforce the related data access policies based on the roles assigned to users and security metadata.

Data Security Federation is different from Identity Federation. With Identity Federation, user's identities and associated credentials are shared with another organization’s authentication infrastructure. Those users are thus able to access the other organization's IT environment and request access to that organization's data. The data remains in place and access control decisions are made by the other organization’s systems.

With Data Security Federation, a publisher shares data with another organization. The Publisher exports a data package that includes the data set, data security-related policies, roles and security metadata. The data Subscriber imports the data package into its systems and uses the policies, roles and metadata to make access control decisions for the shared data. As a result, all access control is conducted in the user's native IT environment.

In summary, Identity Federation allows users to access systems within another organization, granting them entry to that organization’s data. Meanwhile, Data Security Federation facilitates the sharing of data published by one organization with users in their own systems.

In summary, Identity Federation allows users to access systems within another organization thereby granting them access to that organization’s data. Data Security Federation enables data published by another organization to be accessed by users in their own systems.

In preparation for data sharing, the Publisher defines access policies that the Subscriber must enforce when sharing data. These data access policies use roles to identify the types of users that the policy references. The Publisher must define these roles, ensuring adequate criteria are provided such that the Subscriber populates the role with users that meet the screening criteria. The quality of access control decisions will be dependent on the Subscriber ensuing that the assigned users meet the role definition and associated selection criteria, so the quality of the role definitions is critical for obtaining appropriate access control decisions. The Publisher includes these data security role definitions when sharing the data and access policies with the Subscriber.

In preparation for data sharing, the Subscriber implements the Publisher-defined roles in their system(s) that will house the shared data. The Subscriber then populates these roles with the Subscriber's user identities that meet the Publisher’s criteria for those roles. Then the Subscriber loads the data set, security metadata, and policies into their internal system. The Subscriber’s system then uses the security metadata and policies to make access control decisions.

(4) Common Security Meta Model

A Common Security Meta Model is a canonical model that defines the canonical structures for data, roles, data security attributes and date security policies in support of Data Security Federation.

A Common Security Meta Model allows two federating parties to communicate data in a standardized way, without the need to define the structure for communication in advance. The federating parties would use the Common Security Meta Model structures to communicate data, roles, data security attributes and data security policies.

(5) Access Policy Management

Access Policy Management is the creation, use and maintenance of data access policies. Data Access Policies define data access rules using identity roles, identity criteria, data security metadata. Data Access Policies are used by the Access Control Policy Enforcement Points to control users' access to requested data.

When exporting data sets to external parties, the relevant data policies are included with the transmitted data elements.

(6) Distributed Access Control

Distributed Access Control is provided by the Access Control Policy Enforcement Points that are co-located with each data repository. As there will usually be multiple data repositories, this results in a distributed access control topology. Each repository is responsible for protecting the data within it in accordance with the applicable Data Security Access Policies.

Access Control Policy Enforcement Points provide least privilege access to the hosted data resources. All requests are denied by default. Access is granted only when assigned access policies have been satisfied.

Data collection repositories such as data warehouses, data marts, data lakes, etc. typically do not have a well defined access control mechanisms. These data repositories will have to implement Access Control Policy Enforcement Points to protect their hosted data using the defined data access rules, identity roles, identity criteria, and data security metadata.

Commercially available applications will need to translate the provided data access rules, identity roles, identity criteria, and data security metadata to work with their own access controls within their application. The applications must obtain access control results consistent with the intended Data Access Policies to be authorized to host the provided data.

Each Access Control Policy Enforcement Point shall log the results of all access requests granted or denied.

(7) Access Control Validation

Access Control Validation is the process of testing to confirm that the Subscriber’s Access Control Policy Enforcement Points associated with data platforms that are hosting the Publisher’s shared data make appropriate access control decisions that comply with the Publisher’s access policies. Access Control Validation implements a comprehensive test suite for conducting a thorough test of the Subscriber’s Access Control Policy Enforcement Points.

Access Control Policy Enforcement Points must be validated before they are authorized to perform federated data security access control.

Each Subscriber data repository must have its own Access Control Policy Enforcement Point that acts as a gatekeeper between users and the Publisher’s shared data. It is important that each Access Control Policy Enforcement Point comes to the same access control decision for any given data request, set of data and supporting security policies and attributes.

The Data Governance teams of the federating parties will be responsible for certifying that an access control mechanism successfully passed the access control validation test suite prior to sharing any data.

(8) Compliance Auditing

Compliance Auditing is an evaluation of access control decisions that were made against user requests for shared data by an agency that is not a part of the Publisher or Subscriber’s organizations. The purpose of the compliance audit is to confirm that the Subscriber’s systems are enforcing the Publisher’s access policies associated with their shared data. The access control logs all individual access control decisions made against shared data. This enables auditors to verify the appropriate access control decisions were made. Compliance auditing will issue compliance reports on audit findings, document corrective actions, and monitor corrective actions to completion.

Concept Validation

These foundational capabilities must be validated with a reference implementation that leads to a fully functioning proof-of-concept system. Once proven, these capabilities can be formalized with the publication of appropriate standards. Then the proof-of-concept system can be described in a open-source reference design document. With this done, data security federation can be broadly deployed across production systems operated by federating parties.

OAGi Data Security Federation Project Prospectus