Data is one of your most valuable assets; understanding and using data effectively powers your business. However, it can also be a source of privacy, security, and compliance risk. The data discovery and classification capabilities of Cloud Data Loss Prevention (DLP) have helped many Google Cloud customers identify where sensitive data resides. However, data growth is outpacing the ability to manually inspect data, and data sprawl means sensitive data is increasingly appearing in places it’s not expected. You need a solution for managing data risk that can automatically scale with the rapid growth of your data without additional management overhead.
To help, we are happy to announce that we’re making Cloud DLP automatic: automatic discovery, automatic inspection, automatic classification, automatic data profiling. Now available in preview for BigQuery, you can enable Cloud DLP across your entire organization to gain visibility into your data risk. With rich insights for each table and column, you can focus on the outcome, manage data risk, and ultimately help to safely accelerate your business. Automatic DLP is an example of Google Cloud’s Invisible Security vision, where the capability to understand and protect your data is engineered into the platform.
Here are the benefits of Automatic DLP:
- Continuous monitoring: Cloud DLP automatically profiles new tables as you create them across your org. Also, it periodically reprofiles tables that you modify.
- Low overhead: No jobs to manage. Enable it directly in the Cloud Console for an entire organization or select folders or projects.
- Data residency: Cloud DLP will inspect your data and generate data profiles in the same geographic region that your data lives in (as configured in BigQuery).
- Google-driven: Powered by industry leading Cloud DLP, we figure out how to inspect and profile your tables and columns. You can focus on the outcomes.
Rich insights: Table and column profiles give you details about the data risk and sensitivity of your data including Cloud DLP’s predicted infoType.
A data profile is a set of metrics and insights that Cloud DLP gathers from scanning your data. Among these metrics are the predicted infoTypes found in BigQuery tables, “free text” score, uniqueness score, and the data risk level. Use these insights to make informed decisions about how you protect, share, and use your data. You can get results directly in the Cloud Console or export profile details to BigQuery for custom analysis and reporting:
Managing your data risk
Here are few example scenarios of how DLP profiles can help you understand and manage data risk:
Scenario 1: Table found with credit card numbers and a high uniqueness score
Let’s say that a column in a table with 10M rows was classified with a predicted infoType of “CREDIT_CARD_NUMBER” and a high uniqueness score. This indicates that you likely have 10M unique credit card numbers in this table. A lower uniqueness score might indicate that you have fewer numbers repeated in the table.
Potential Action to Take: If this type of data is acceptable for you to store and process, you can lower this data risk by applying a BigQuery Policy Tag which would restrict access to this column to only those with specific permission. Alternatively, if you do not want to store this raw information, consider tokenizing the data using Cloud DLP’s de-identification methods or solutions for PCI Tokenization.
Scenario 2: Table found with several infoTypes and a high free text score.
Let’s say that a column in a table does not have a strong predicted infoType but has hints of PHONE_NUMBER, US_SOCIAL_SECURITY_NUMBER, and DATE_OF_BIRTH along with a high “free text” score. This indicates that you may have a column of unstructured data in your table that has occasional instances of PII. This could, for example, be a note field or comment field where someone types in PII such as “customer was born on 1/1/1985” and is an indication of potential risk.
Potential Action to Take: Consider running a deep scan of this column using Cloud DLP’s on-demand inspection for BigQuery so that you can understand where instances of PII may exist in specific rows or cells. Or consider using Cloud DLP’s masking capability to replace this table with a de-identified version.
Scenario 3: Table found with sensitive data and shared publicly
Let’s say that a table contains customer EMAIL_ADDRESS and PHONE_NUMBER and it was shared with a marketing partner. However, instead of being shared directly, this table was made public. This greatly increases the risks of exposure of this sensitive information.
Potential Action to Take: Adjust permissions to this table to remove public access groups like
AllAuthenticatedUsers. Instead, add the specific users or groups that should have access to the data.
Get started with Automatic DLP
Automatic DLP Profiling is available now in preview for BigQuery.
By: Scott Ellis (Product Manager, Google Cloud) and Jordanna Chord (Staff Software Engineer)
Source: Google Cloud Blog