Shine On With User-Friendly SQL Capabilities In BigQuery

June is the month which holds the summer solstice, and (at least in the northern hemisphere) we enjoy the longest days of sunshine out of the entire year. Just as the sun is making its longest trips across the sky, the BigQuery team is delighted to announce our next set of user-friendly SQL features.

These new capabilities come in three themes: Powerful Analytics Features, Flexible Columns, and Secure with SQL. 

Powerful Analytics Features (Continued)

Last month, we announced Powerful Analytics Features for advanced filtering of analytic functions with QUALIFY, and pivoting columns with aggregation using PIVOT & UNPIVOT. This month, we continue to bring valuable analytics capabilities by enabling BigQuery users to encode business logic in user-defined functions using table functions

Table function (Preview)

BigQuery users often persist and share business logic using user-defined functions (UDFs), extending BigQuery’s SQL functionality. UDFs accept scalar arguments as input and calculate a scalar value as output. But many customers require the ability to support functions that can output a table. 

We are excited to share that BigQuery now supports table functions, also known as table-valued functions (TVFs). These functions output an entire table instead of a single scalar value, making them extremely versatile tools for an analyst’s SQL toolkit. 

We use the bigquery-public-data.usa_names.usa_1910_current to walk you through potential use cases here.

Language: SQL

-- Create a table function called “names_by_year” stored in
-- “mydataset”. It references the public dataset 
-- “bigquery-public-data.usa_names.usa_1910_current” 
CREATE OR REPLACE TABLE
  FUNCTION mydataset.names_by_year(y INT64) AS
SELECT
  year,name, SUM(number) AS total
FROM
  `bigquery-public-data.usa_names.usa_1910_current`
WHERE
  year = y
GROUP BY
  year, name;
-- table function can be called by an analyst 
-- to retrieve the year, name, and total occurrences 
-- for a given year. 
SELECT *
FROM mydataset.names_by_year(2000);

Flexible Columns

For your ever-changing business needs, BigQuery offers greater granularity and flexibility for managing your column data types, descriptions, and aliases. 

Parameterized Data Types (Preview)

Business data often requires additional precision on data types to enforce business rules. In response to this, BigQuery’s standard data types support the specification of additional parameters on these data types to allow for more targeted enforcement of business data rules. 

For instance, you may want to use BigQuery to store ISO country codes as 2-digit characters. For this use case, there is a business requirement to ensure that records in this column have a maximum of two characters. To support business cases like this, BigQuery now supports parameterized data types, which allows you to set a max length parameter on this string column, as well as the following data types: 

Below, we illustrate a customer creating a table with a field for credit_card_number stored as STRING. With parameterized data types, the customer can limit the field to 16 characters of length to make sure credit card number inputs do not contain extra characters.

Language: SQL

-- Using parameterized data types 
-- limit the length of the column. 
CREATE TABLE
  mydataset.customers(credit_card_number STRING(16));
-- Inserting >16 characters will fail for this column.
INSERT INTO
  mydataset.customers(credit_card_number)
VALUES
  ('12345678901234567890');
-- Inserting a valid credit card number will succeed
INSERT INTO
  mydataset.customers(credit_card_number)
VALUES
  ('1234567890123456');

Flexible Data Type Casting and Formatting (GA)

Temporal data are ubiquitous in data analytics workflows, and wrangling these temporal data to the required types and formats is a vital use case for BigQuery customers. 

For example, an analyst may need to convert a temporal data type (i.e. DATE) to a STRING data type while applying a specific formatting choice (i.e. YYYY/MM/DD). In this use case, a data type conversion and a formatting change are both required. Existing BigQuery capabilities such as CAST, along with Date functions and Datetime functions, are used by customers today for this use case. In addition to these, the BigQuery team is announcing new capabilities aligned with the ANSI standard to support flexible data type casting and formatting. 

First, we are adding the ability to include a FORMAT within a CAST function and STRING to NUMERIC/BIGNUMERIC conversion.

Language: SQL

-- New syntax supported: CAST and FORMAT date type 
-- to string type
SELECT CAST(DATE '2018-01-30' AS STRING FORMAT 'YYYY/MM/DD');
--returns '2018/01/30'
-- Compare above to the existing syntax for FORMAT_DATE()
-- which accomplishes similar formatting
SELECT FORMAT_DATE("%Y/%m/%d", DATE '2018-01-30');
--returns '2018/01/30'
-- New syntax supported: CAST and FORMAT string type 
-- to date type
SELECT CAST('20180130' AS DATE FORMAT 'YYYYMMDD');
-- returns DATE '2018-01-30'
-- Compare above to the existing syntax for PARSE_DATE()
-- which accomplishes similar formatting
SELECT PARSE_DATE("%Y%m%d", "20180130");
--returns ‘2018-01-30’ as a date
-- New capability: number type to string type
select CAST(1234.5 as STRING FORMAT '9,999.99');
-- returns ' 1,234.50'
-- New capability: conversion from string type 
-- to numeric type
SELECT PARSE_NUMERIC('- 12,34567,89.0');
-- returns NUMERIC'-123456789'
-- also works for BIGNUMERIC

Second, for the numeric data type INT64, we will support different new INT64 alias types. Now SQL analysts can use BYTEINT, TINYINT, SMALLINT, BIGINT, INT, and INTEGER as aliases for BigQuery’s INT64 numeric data type.

Language: SQL

-- Aliasing for INT64 with SMALLINT
SELECT CAST("123" as SMALLINT)
-- returns 123 as INT64 data type

Third, we are adding a new function to convert STRING/BYTES data types to a GEOGRAPHY data type using ST_GEOGFROM.  Unlike existing functions such as ST_GEOGFROMWKB or ST_GEOGFROMTEXT which require the user to specify the input format, ST_GEOGFROM can automatically detect the input format.

Language: SQL

-- ST_GEOGFROM conversion of three formats
-- first input “Point(2 4)” is WKT format
-- second input is WKB in hexadecimal text format
-- third input format is GeoJSON
SELECT 
  ST_GEOGFROM("Point(2 4)"), 
  ST_GEOGFROM('010100000000000000000000400000000000001040'), 
  ST_GEOGFROM(
'{ "type": "Polygon", "coordinates": 
[ [ [2, 0], [2, 2], [1, 2], [0, 2], [0, 0], [2, 0] ] ] }'
  ) 
-- returns GEOGRAPHY values “Point(2 4)”, 
-- “Point(2 4)”, and 
-- “POLYGON((2 0, 2 2, 1 2, 0 2, 0 0, 2 0))”

Change a column’s description (GA)

While BigQuery has historically provided many tools available in the UI, CLI and APIs, we know that many administrators prefer interfacing with SQL. Last month, we released ALTER COLUMN DROP NOT NULL constraints, which allows the administrator to remove the NOT NULL constraint from a column BigQuery. This month, we provide an additional ALTER COLUMN DDL feature: 

The example below combines the two ALTER COLUMN features together for one use case: 

Language: SQL

-- assume now that the credit_card_number field 
-- has a NOT NULL constraint 
CREATE TABLE
  mydataset.customers(credit_card_number STRING(16) NOT NULL);
-- the customer can add a description that the column 
-- is required
ALTER TABLE
  mydataset.customers ALTER COLUMN credit_card_number
SET OPTIONS (description=’this COLUMN IS required’);
-- if needs of the business no longer require this field,
-- the customer can allow null entries in this column
ALTER TABLE
  mydataset.customers 
ALTER COLUMN credit_card_number DROP NOT NULL;
-- change the description to match the NOT NULL 
-- constraint status
ALTER TABLE
  mydataset.customers ALTER COLUMN credit_card_number
SET
  OPTIONS (description=’this COLUMN IS NO LONGER required’);

Secure through SQL

Rather than using the UI, CLI, or API to provision access to tables, datasets, and views, many administrators prefer the familiar SQL interface. Today, we’re excited to support provisioning and reviewing access privileges on tables, datasets, and views using SQL in BigQuery.  

GRANT and REVOKE access to tables, datasets and views (GA)

Here, we are adding two SQL features to BigQuery Standard SQL: GRANT and REVOKE. These will allow administrators to add and remove access privileges for users on tables, datasets, and views. 

Review object privileges with INFORMATION_SCHEMA (Preview)

INFORMATION_SCHEMA.OBJECT_PRIVILEGES shows explicit object privileges set on tables, datasets, and views (these are the same set of permissions returned with the BQ CLI). This familiar SQL interface enables administrators to ensure the right users have access to the right objects. 

In the example below, we see that these three SQL features can be used together to provision access to a dataset, and then ensure that the privileges on a given dataset are correct.

Language: SQL

-- create a sample dataset
CREATE SCHEMA mydataset;
-- modify the privileges on ‘mydataset’
-- note: SCHEMA here is equivalent to a BigQuery dataset
GRANT `roles/bigquery.admin` 
ON SCHEMA mydataset TO "user:[email protected]";
REVOKE `roles/bigquery.admin` 
ON SCHEMA mydataset FROM "user:[email protected]";
-- Check privileges in the Object Privileges table.
-- Note that it is region-qualified
SELECT
  *
FROM
  `region-us`.INFORMATION_SCHEMA.OBJECT_PRIVILEGES
WHERE
  object_name = "mydataset"

The summer solstice is celebrated around the world with many cultures and traditions. The BigQuery team hopes to celebrate with you as we continue to help your data analytics workflows shine. To learn more about BigQuery, visit our website, and get started immediately with the free BigQuery Sandbox.

By: Dan Peng (Product Manager, Google Cloud) and Jagan R. Athreya (Product Manager, Google Cloud)
Source: Google Cloud Blog

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article
Accessibility | 5G | Smart Device

Google Cloud and Ericsson Partner To Deliver 5G And Edge Cloud Solutions for Telecommunications Companies And Enterprises

Next Article
Programming | Python | Query

Running Older SQL Server (< 2017) Databases On Cloud SQL For SQL Server

Related Posts