Are you worried about controlling your BigQuery costs across multiple projects? In this blog, you will learn about the different guardrails BigQuery provides to limit costs, and monitor BigQuery consumption. Also, learn how to design the warehouse that scales seamlessly while keeping costs under your control.
1. Set up user-level quota and project-level quota can help set a cap on the usage
If you have multiple BigQuery projects and users, you can manage costs by requesting a custom quota that specifies a limit on the amount of query data processed per day. Creating a custom quota on query data lets you control costs at the project-level or at the user-level.
- Project-level custom quotas limit the aggregate usage of all users in that project.
- User-level custom quotas are separately applied to all users and service accounts within a project.
It is not possible to assign a custom quota to a specific user or service account.
Best practice: Create custom cost control based on the maximum amount of data processed in a day. Start small with a few GBs. It is easy to increase the limit as needed.
Steps to set up: Create custom cost controls | BigQuery | Google Cloud
- Go to Quotas page on your Google cloud console Working with quotas | Documentation | Google Cloud
- Select the BigQuery API
- Change the Query usage per day per user and Query usage per day quota from Unlimited to limited GBs/TBs (see the screenshot below)
2. Limit query costs by restricting the number of bytes billed
You can limit the number of bytes billed for a query using the maximum bytes billed setting. When you set maximum bytes billed, the number of bytes that the query will read is estimated before the query execution. If the number of estimated bytes is beyond the limit, then the query fails without incurring a charge.
If a query fails because of the maximum bytes billed setting, an error like the following is returned:
Error: Query exceeded limit for bytes billed: 1000000. 10485760 or higher required.
Best practice: Use the maximum bytes billed setting to limit query costs. Start small with a few GBs. It is easy to increase the limit as needed.
Steps to set up: Control costs in BigQuery Guide
3. Create Budgets and alerts on GCP to catch any cost spikes
Avoid surprises on your bill by creating Cloud Billing budgets to monitor all of your Google Cloud charges in one place. A budget enables you to track your actual Google Cloud spend against your planned spend. After you’ve set a budget amount, you set budget alert threshold rules that are used to trigger email notifications. Budget alert emails help you stay informed about how your spend is tracking against your budget. You can also use budgets to automate cost control responses.
Best practice: Setting up a budget to track the spend is highly recommended. Set threshold rules to trigger email alert notifications. When your costs (actual costs or forecasted costs) exceed a percentage of your budget (based on the rules you set) you will get alert emails.
Steps to set up: To create a new budget:
- Create and name the budget
- Set the budget scope
- Set the budget amount
- Set the budget threshold rules and actions
- Click finish to save the new budget
4. Write good queries
Here are a few best practices –
- Avoid Select *
NOTE : Applying a LIMIT clause to a query does not affect the amount of data that is read.
- Don’t run queries to explore or preview table data
- Dry run and always estimate the costs before running the query
- Only select the data needed
Best practice: Always make any new BigQuery user aware about these best practices. Following two documents are a MUST READ.
5. Partition the table so that BigQuery user is forced to specify WHERE clause
A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.
With partitioned tables, the customer is forced to specify WHERE clause and that will enforce the constraint of limiting full table scans.
If a query uses a qualifying filter on the value of the partitioning column, BigQuery can scan the partitions that match the filter and skip the remaining partitions. This process is called partition pruning.
Partition pruning is the mechanism BigQuery uses to eliminate unnecessary partitions from the input scan. The pruned partitions are not included when calculating the bytes scanned by the query. In general, partition pruning helps reduce query cost.
Best practice: Implement partitioning where possible. This only improves performance but also leads to efficient queries. Highly recommend reading the following 2 documents.
Steps to set up: Creating partitioned tables | BigQuery | Google Cloud
6. Reservations – Start small and incrementally add slots based on usage
BigQuery offers two pricing models for analytics:
- On-demand pricing: You pay for the data scanned by your queries. You have a fixed, per-project query processing capacity, and your cost is based on the number of bytes processed.
- Flat-rate pricing: You purchase dedicated query processing capacity.
By default, you are billed according to the on-demand pricing model. Using BigQuery Reservations, you can switch to flat-rate pricing by purchasing commitments. Commitments are purchased in units of BigQuery slots. The cost of all bytes processed is included in the flat-rate price.
Flat-rate pricing offers predictable and consistent costs. You know up-front what you are spending.
More information – Introduction to Reservations | BigQuery | Google Cloud
Best practice: Use the BigQuery slot estimator to understand your on-demand slot consumption. Once your slot usage goes above 100, is relatively steady, start thinking about getting Flex slots/Monthly or Annual reservations.
Steps to set up: Get started with reservations | BigQuery | Google Cloud
7. Monitoring the BigQuery metrics
BigQuery provides its native admin panel with overview metrics for monitoring. BigQuery is also well integrated with existing GCP services like Cloud Logging to provide detailed logs of individual events and Cloud Monitoring dashboards for analytics, reporting and alerting on BigQuery usage and events.
More information – BigQuery Admin Reference Guide
- The key to successful monitoring is to enable proactive alerts. For example, setting up alerts when the reservation slot utilization rate crosses a predetermined threshold.
- Also, it’s important to enable the individual users and teams in the organization to monitor their workloads using a self-service analytics framework or dashboard. This allows the users to monitor trends for forecasting resource needs and troubleshoot overall performance.
- Understand and leverage INFORMATION_SCHEMA for real-time reports and alerts. Review more examples on job stats and technical deep-dive INFORMATION_SCHEMA explained with this blog.
Steps to set up:
- To get started quickly with monitoring on BigQuery, you can leverage publicly available data studio dashboard and related github resources.
- Looker also provides BigQuery Performance Monitoring Block for monitoring BigQuery usage.
- Blog on how to implement a fully serverless solution for near–real-time cost monitoring using readily available log data – Taking a practical approach to BigQuery cost monitoring | Google Cloud Blog
Here is a list of training sessions that will be useful for new BigQuery users.
- BigQuery 101 – Exploring and Preparing your Data with BigQuery | Coursera
- Getting from data to insights – From Data to Insights with Google Cloud Specialization
Hands on labs:
- BigQuery 101 – Insights from Data with BigQuery | Google Cloud Skills Boost
- Build and Optimize Data Warehouses with BigQuery – Build and Optimize Data Warehouses with BigQuery | Google Cloud Skills Boost
- BigQuery for Machine Learning – BigQuery for Machine Learning | Google Cloud Skills Boost
Data engineering and smart analytics learning path:
By: Sonakshi Pandey (Customer Engineer, Analytics)
Source: Google Cloud Blog