Jump to Content
Google Cloud

New ways to manage sensitive data with the Data Loss Prevention API

October 19, 2017
Scott Ellis

Senior Product Manager

If your organization has sensitive and regulated data, you know how much of a challenge it can be to keep it secure and private. The Data Loss Prevention (DLP) API, which went beta in March, can help you quickly find and protect over 50 types of sensitive data such as credit card numbers, names and national ID numbers. And today, we’re announcing several new ways to help protect sensitive data with the DLP API, including redaction, masking and tokenization.

These new data de-identification capabilities help you to work with sensitive information, while reducing the risk of sensitive data being inadvertently revealed. If like many enterprises you follow the principle of least privilege or need-to-know access to data (only use or expose the minimum data required for an approved business process) the DLP API can help you enforce these principles in production applications and data workflows. And because it’s an API, the service can be pointed at any virtually any data source or storage system. DLP API offers native support and scale for scanning large datasets in Google Cloud Storage, Datastore and BigQuery.

Google Cloud DLP API enables our security solutions to scan and classify documents and images from multiple cloud data stores and email sources. This allows us to offer our customers critical security features, such as classification and redaction, which are important for managing data and mitigating risk. Google’s intelligent DLP service enables us to differentiate our offerings and grow our business by delivering high quality results to our customers.

—  Sateesh Narahari, VP of Products, Managed Methods

New de-identification tools in DLP API

De-identifying data removes identifying information from a dataset, making it more difficult to associate the remaining data with an individual and reducing the risk of exposure.

https://storage.googleapis.com/gweb-cloudblog-publish/original_images/image4vulc.GIF
With the DLP API, you can classify and mask sensitive elements in both structured data and unstructured data.

The DLP API now supports a variety of new data transformation options:

Redaction and suppression 
Redaction and suppression remove entire values or entire records from a dataset. For example, if a support agent working in a customer support UI doesn’t need to see identifying details to troubleshoot the problem, you might decide to redact those values. Or, if you’re analyzing large population trends, you may decide to suppress records that contain unique demographics or rare attributes, since these distinguishing characteristics may pose a greater risk.

https://storage.googleapis.com/gweb-cloudblog-publish/images/dlp-api-591ib.max-700x700.PNG
The DLP API identifies and redacts a name, social security number, telephone number and email address

Partial masking 
Partial masking obscures part of a sensitive attribute — for example, the last 7 digits of a US telephone number. In this example, a 10-digit phone number retains only the area code.

https://storage.googleapis.com/gweb-cloudblog-publish/images/dlp-api-2q5fq.max-700x700.PNG

Tokenization or secure hashing

Tokenization, also called secure hashing, is an algorithmic transformation that replaces a direct identifier with a pseudonym or token. This can be very useful in cases where you need to retain a record identifier or join data but don’t want to reveal the sensitive underlying elements. Tokens are key-based and can be configured to be reversible (using the same key) or non-reversible (by not retaining the key).

The DLP API supports the following token types:

  • Format-Preserving Encryption - a token of the same length and character set.
https://storage.googleapis.com/gweb-cloudblog-publish/images/dlp-api-1.max-700x700.png
  • Secure, key-based hashes - a token that's a 32-byte hexadecimal string generated using a data encryption key.

https://storage.googleapis.com/gweb-cloudblog-publish/images/dlp-api-3n6l7.max-700x700.PNG

Dynamic data masking 

The DLP API can apply various de-identification and masking techniques in real time, which is sometimes referred to as “Dynamic Data Masking” (DDM). This can be useful if you don’t want to alter your underlying data, but want to mask it when viewed by certain employees or users. For example, you could mask data when it’s presented in a UI, but require special privileges or generate additional audit logs if someone needs to view the underlying personally identifiable information (PII). This way, users aren’t exposed to the identifying data by default, but only when business needs dictate.

https://storage.googleapis.com/gweb-cloudblog-publish/images/dlp-api-4h172.max-700x700.PNG
With the DLP API, you can prevent users from seeing sensitive data in real-time

Bucketing, K-anonymity and L-Diversity 

The DLP API offers even more methods that can help you transform and better understand your data. To learn more about bucketing, K-anonymity, and L-Diversity techniques, check out the docs and how-to guides.


Get started with the DLP API

With these new transformation capabilities, the DLP API can help you classify and protect sensitive data no matter where it’s stored. With all tools that are designed to assist with data discovery and classification, there's no certainty that it will be 100% effective in meeting your business needs or obligations. To get started with DLP API today, take a look at the quickstart guides.

Posted in