“Data is the new oil”, a phrase we hear too often. Organizations across the world are transacting and storing explosive amounts of data. This data includes sensitive private data belonging to their customers. The General Data Protection Regulation, or GDPR, that went into effect almost exactly one year ago today — May 25th, 2018 — was designed to protect the identities of users. It mandated that user data (or personally identifiable information, also known as PII) cannot be utilized by organizations without their explicit consent. PII is not just names, phone numbers, or bank accounts, it is anything that can identify a person, including IP or MAC addresses.
The regulation aims to give users control over how their data gets used, to build confidence with consumers that their data is safe and demands accountability from organizations on how personal data is processed and protected. In summary, it is about users being able to trust enterprises with their data. GDPR is the most comprehensive regulation to protect consumers by holding companies to a higher standard of security for personal data and to instill stricter security controls and audit measures. While the intent of GDPR is quite positive, many businesses are struggling to hit the compliance mark. Why? Let’s look at one of the key culprits: mass data fragmentation.
Mass Data Fragmentation: What is it and why does it make GDPR compliance so challenging?
With increasing threat of cyber-attacks and ways in which data can be stolen, companies large and small are finding it difficult to keep their data safe and to meet GDPR policies.
More than 80% of all data within an enterprise sits in backups, archives, object stores, filers, and test/dev environments. This data, sometimes called secondary data, lives in siloed infrastructure that is spread across multiple point product and locations, including on-premises and in public cloud infrastructure. In a survey of over 900 senior IT decision makers conducted by Vanson Bourne, the results showed that 87% of respondents acknowledge that secondary data is fragmented and becomes impossible to manage long term, 63% have 4-15 copies of the same data and 85% store data between 2-5 public cloud platforms. These statistics raise major questions for organizations globally: If they have all of these data copies, how can they possibly know what PII is in those copies? And, if those copies have been replicated to a host of public clouds, who is keeping track of what PII is where?
As you might imagine, due to this enormous data sprawl and lack of visibility, it is nearly impossible for the IT owners to locate and take corrective action on so much personal data that they might have within their environment – quite frankly because they don’t know what data they’ve got and where it’s located. At the end of a full year of GDPR, there is adequate evidence that organizations have struggled from lack of resources and the sheer complexity of handling of data to ensure compliance.
What’s the Path Forward to Address Mass Data Fragmentation and Compliance Challenges?
The trick is to look for solutions that are purpose built to handle mass data fragmentation from the ground up. Organizations will not find their answers in point products that only address one type of workload – like backups or test/dev. Organizations need to embrace a platform that brings everything together.
Cohesity provides the only web-scale, hyperconverged solution that consolidates multiple workloads including backup, files and objects, test/dev and analytics, all on one platform. As a software-defined solution, Cohesity spans across on-premises, public cloud and edge deployments, making it simple to unify and manage all data. It is like herding cats (but in a good way), bringing data together into a data platform that is built with a strong suite of backup and recovery capabilities powered by a distributed file system (SpanFS), a centralized management platform (Helios) and a Google-like search that makes it easy to sift through petabytes of data from a single user interface. And, in case you didn’t catch that, it’s worth repeating. Users can manage all of this from one simple-to-use interface – modern data management at its best.
But, this is just the first step. Users also need greater visibility, analytics and of course protection to be able to ensure GDPR compliance. This is where the need for applications comes into play – applications that can run on that same platform.
Wouldn’t it be great if you could pump data to an application suite that sifts through massive volumes of secondary data and produces compliance reports based on areas where you need to limit exposure?
Absolutely! Cohesity, uniquely allows applications to run directly on backup and other unstructured secondary data. This capability enables enterprises to download applications from Cohesity’s Application MarketPlace and run those apps to gain insights on stored data. Applications like Cohesity Insight and Pattern Finder can help IT owners identify personal information, including credit card numbers, national identification numbers, etc. These apps allow enterprises to take preventive actions to meet their compliance requirements for GDPR, HIPAA, and PCI. All this happens without the data ever leaving the platform, which shrinks the blast radius for identity thieves.
The Cohesity applications architecture allows for a wide spectrum of applications, ranging from a single function to an enterprise suite to be run from within the Cohesity platform. These apps can extend the capability of the data platform, providing tools to inspect the data – processing them in-situ and taking action before they hit the network. With improved visibility comes security and policy. Data can be searched easily for compliance violations and policies can be configured to prevent such observations/violations.
Let us now double click on the application story with some concrete examples of apps that can run within Cohesity clusters to demonstrate how apps can help you achieve GDPR compliance.
Cohesity Insight is an app that can be downloaded to a Cohesity cluster straight from the Helios multi-cluster cloud management platform. It helps provide enhanced Google-like search capabilities to search through:
- Interactive search on unstructured data
- Cohesity views and shares
- Common document types used by the enterprise – MS Office, text, PDF, zipped folders that contain any of these files.
With Cohesity Insight, you can automate common searches for personal data being held within Cohesity secondary storage. This vastly simplifies the effort and cuts down the time to match personal data and to prevent it from getting into the hands of bad actors.
Cohesity Spotlight is another relevant application that looks through file audit logs and helps take action based on anomalous behavior. This app is available through the Cohesity app marketplace and can be downloaded to run directly from a Cohesity cluster. When Cohesity Spotlight is running on Cohesity, it:
- Analyzes audit logs by operation type – user, date range, file path including metadata such as when it was created, accessed, modified, closed, deleted.
- Can flag anomalous access, entity, actions that don’t match usual access or operational trends.
- Generates reports of such anomalies that can be downloaded or exported.
- Presents audit dashboards that can be pivoted upon based on filters.
Cohesity‘s mission is to simplify data management by eliminating mass data fragmentation. With a platform approach, Cohesity has not just consolidated multiple workloads on a single solution but uniquely empowers enterprises to make their data much more productive to address other critical business requirements.