Restricting access to datasets in BigQuery is a common request I see, but it’s not so straight forward to handle with Google Cloud services. The following approach is what I typically use to show specific datasets to specific users or groups.
If you know what you’re doing, and you’re here for the technical specifics, here’s how it can be done:
BigQuery Job User
BigQuery Data Viewer
For more details, I’ll walk through the full approach below.
I commonly see this done for the following use cases:
You can use the following approach to limit a set of users so that they only see specific data in BigQuery:
This approach will expose only views to a set of users, all tables are stored and maintained behind the scenes. It provides a good way to protect and isolate your true data sources, avoiding any potential corruption or accidental transformation.
The following steps will make it happen:
Create a dataset in your project that will contain all the views available to this set of users. If needed you can create multiple datasets for organization or clarity, just repeat the following steps for all the datasets you create for this set of users (called principals in Google BigQuery).
Go to the IAM management screen for your BigQuery project. We don’t need anything at the organization level, unless you’re using groups instead of individual principals. Add your users under the “VIEW BY PRINCIPALS” with the role BigQuery Job User
.
After settings the project level permissions in Google’s IAM you’ll need to set specific permissions for the newly created dataset(s) we added. Open the dataset(s) then click the SHARING link towards the top right of the screen and select Permissions from the dropdown menu. Click the ADD PRINCIPAL button in the Share permissions form that appears, then add the same users we just added at the project level (our BigQuery Job User
roles), except this time set the role to BigQuery Data Viewer
.
At this point, our new users will see only these datasets when they log in to this BigQuery project, but they won’t be able to query anything in them yet. If you try to you’ll see the following error:
Access Denied: Table PROJECT:DATASET.VIEW: User does not have permission to query table PROJECT:DATASET.VIEW,
or perhaps it does not exist in location US.
This is because the user doesn’t have access to the table queried by the view itself. To provide that access, you need to set up authorized views. Authorized views allow users to run queries without access to the underlying source data. It’s the final step needed to make this all work.
To set them up:
Once you’ve done this you’re done, you now have a set of users that can see and query a specific subset of datasets in your BigQuery data warehouse. All other data will be hidden from these users.
Why is this important? In general, permissions and access management is critical for a data warehouse for the following reasons:
Data warehouses often contain sensitive information that could include personal customer details, financial records, intellectual property, or strategic business information. It’s crucial to ensure that only authorized personnel have access to this data to prevent leaks and breaches that could lead to financial loss, legal action, or damage to reputation.
Many industries are subject to regulations regarding the handling of data, such as the General Data Protection Regulation (GDPR) in Europe, Health Insurance Portability and Accountability Act (HIPAA) in the United States, and other privacy laws globally. Proper permissions and access management ensure compliance with these laws by controlling who has access to sensitive information and auditing data access.
By controlling who can access and modify data, organizations can ensure the accuracy and reliability of the data in the warehouse. If too many people have the ability to change data, it can lead to inconsistencies, errors, and difficulties in tracking changes.
Access management is often governed by the principle of least privilege, which means users are granted only the access necessary to perform their job functions. This reduces the risk of accidental or deliberate misuse of data.
With proper access controls, it becomes easier to monitor who is accessing what data and when. This is important not only for security, but also for maintaining an audit trail that can be invaluable during investigations or audits.
Different customers and stakeholders may have differing expectations and legal rights regarding their data privacy. Managing permissions properly ensures that an organization can meet these expectations and uphold the rights of individuals.
Effective access management can mitigate risks related to accidental data deletion, unauthorized data manipulation, and leakage of proprietary information.
Access controls can also prevent unnecessary system load by restricting the number of queries and operations performed by users who do not need to perform data-intensive tasks.
Maintaining strict access controls helps in building trust with customers, partners, and stakeholders by demonstrating that the company takes data confidentiality seriously.
You should now have a solid way to isolate data from users in BigQuery, exposing only what is needed to the right team members. If you need any assistance setting this up, or working with BigQuery to build or manage a data warehouse, I’m available for hire.