GDPR: Approaches for Protecting PII & SPI

GDPR: Approaches for Protecting Personally Identifiable Information (PII) and Sensitive Personal Information (SPI)

Share with your network!

December 21, 2021 Jeremy Wittkop

Many companies are subject to the European Union’s General Data Protection Regulation (GDPR). In my view, vendors and service providers often speak too generally about the GDPR and oversimplify solutions to issues that organizations raise about complying with its mandates.

Rather than try to address all the fine details of the regulation in this post, I’ll specifically address practical issues that most companies will need to focus on to comply with the GDPR’s requirements: protecting Personally Identifiable Information (PII) and Sensitive Personal Information (SPI).

These two categories of personal information are very different from each other and require separate approaches to accurately identify and protect them as they flow through an organization’s data environment. Here’s a closer look at the challenges of protecting these two information types:

Personally Identifiable Information (PII)

PII is the first category of information that the GDPR covers. It includes information that’s generally accepted as personally identifiable, such as names and national identifiers like Social Security numbers in the United States, and European identifiers such as Italy’s Codice fiscale and driver’s license numbers in the United Kingdom. It’s important to note that the GDPR expands the definition of PII to include data elements like email and IP addresses.

The identifiers of the various forms of PII covered by the GDPR have commonalities in that they generally follow defined formats and are relatively easy to program into a content analytics system with regular expressions. Because of these commonalities, data loss prevention (DLP) solutions and technologies are ideal for identifying and protecting PII. DLP technologies can be enterprise class or integrated into other products like firewalls, cloud access security brokers (CASBs) or web gateways.

There are two key areas within the GDPR that identify DLP as the optimal solution for PII protection. First, the sections related to data security stipulate that an organization have reasonable controls to monitor the flow of data throughout the environment. In my interpretation, that means an organization must be able to monitor the use of personal information at the endpoint, in transit via web and email channels, and where it’s stored throughout an environment. It should also include visibility into how information is stored in cloud applications and transferred between cloud environments.

Second, I cannot imagine a scenario in which an organization could comply with the GDPR’s “Right to be Forgotten” or guarantee a “Right to Erasure” without the ability to find the relevant PII throughout all their systems, including cloud applications, and then remove it. Therefore, a DLP capability, while not making an organization compliant, is a required element to achieve compliance.

Building a DLP program for the purpose of complying with relevant GDPR articles requires planning, coordination between business units, and a good deal of care and feeding. However, protecting PII has been a best practice for more than a decade, and many people have experience building such programs.

Sensitive Personal Information (SPI)

Protecting SPI is a far greater operational challenge for organizations. SPI refers to information that doesn’t identify an individual but is related to an individual and communicates information about that person that is private or could potentially harm the individual should it be made public. SPI includes biometric data, genetic information, sex, trade union membership and sexual orientation.

The challenge with traditional data security tools like DLP in protecting SPI is that many of those data elements exist in common usage without being related to an individual. It’s also very difficult to program a content analytics engine to find information that is in scope with the GDPR without finding large volumes of information that aren’t in scope at the same time.

The most elegant solution to protect SPI in my experience is to add a data classification program to the overall security program and integrate it with DLP programs. Data classification allows a user to tag data by selecting a classification from a list. Many people are familiar with classification schemas used by governments and military organizations, which classify information by levels of secrecy. For example, classifications may include public, sensitive, secret and top secret. The most effective data classification tools are very flexible, allowing for multiple levels of classification and offering customizable fields.

For unstructured SPI data, an organization could develop a classification schema with simple drop-down menus with “yes” or “no” choices for users to confirm whether a document contains PII and SPI. The data classification solution would then apply metadata tags to those documents, which security tools like DLP could use to apply rules to the information. This is a far more efficient and effective method of protecting SPI than trying to find all instances of sensitive personal information categories referencing an individual as opposed to the same terms in common usage. Microsoft Information Protection, for example, is a popular sensitivity labeling solution for Office 365.

Data classification can help reinforce behavioral change

Breaches of personal data can happen in various ways. Those that garner the most attention are large-scale breaches, which are often caused by incorrect technical configurations or a lack of due care on an industrial scale. But far more frequently, information is compromised on a small scale due to a user being careless or lacking awareness about the sensitivity of data they’re handling. In these cases, data classification can help reduce the risk of a breach significantly.

A large part of the spirit of the GDPR is to prompt people to think about the information they’re handling and to do so with care. Complying with the spirit of the regulation will require a culture change in some organizations, which can be aided considerably by building a data classification program. That way, users can easily identify when they’re handling sensitive information and hopefully do so with more care as they go about their daily routine. Many data classification solutions can also communicate with end users through tips or pop-up messaging that can help to reinforce behavioral change.

Conclusion

Many organizations today are suffering from “GDPR fatigue.” Many technology and service providers use fear to sell products and services without addressing specific solutions to the challenges posed by the GDPR—as a result, many organizations have simply stopped listening.

I don’t look at the GDPR as a reason for fear. I see it as a positive way for organizations to enhance their security programs to protect personal information. GDPR compliance is relatively straightforward. However, the basis of compliance is understanding how to identify and protect PII and SPI.

Therefore, programs to enable PII and SPI identification and protection are the foundational elements of GDPR compliance from a tools and capabilities perspective. DLP and data classification form a powerful combination for protecting both PII and SPI. The challenge becomes using those capabilities appropriately to fulfill controller and processor obligations and protect data subject rights.

Learn More

To learn more about information protection, visit here.