Most commonly, data teams have worked with structured data. Unstructured data, which includes images, documents, and videos, will account for up to 80 percent of data by 2025. However, organizations currently use only a small percentage of this data to derive useful insights.
One of main ways to extract value from unstructured data is by applying ML to the data. This could be in the form of extracting objects from images, translating text from one language to another, character recognition from images, sentiment analysis and much more. Performing such tasks is currently achievable by using services that host ML models for these operations. However, business across industries are faced by three major challenges:
- Data management: data scientists/analysts have to move stored data to where they build ML pipelines, a notebook or other AI platforms
- Infrastructure management: there is no security and governance guarantees desired by large enterprises
- Limited data science resources: require developing custom solutions in Python, or use frameworks such as Spark or Beam/Dataflow
BigQuery is an industry leading, fully managed, cloud data warehouse that helps users manage and analyze all of their structured and semi-structured data. Taking advantage of its storage and compute scale, BigQuery also enables users to do in-database machine learning. Now, BigQuery is expanding these capabilities to unstructured data by providing an integrated solution that eliminates data silos, democratizes compute without sacrificing enterprise-grade security and governance guarantees provided by the underlying Data Warehouse. Data practitioners can now use familiar SQL constructs to analyze images, text, etc. at scale, and enrich the insights by combining structured and unstructured data in one system. In this article, you will learn:
- About object tables which enable access to unstructured data
- How to run SQL to get insights from images
- How to expand unstructured data analytics leveraging Cloud AI services
Introducing Object Tables to enable access to unstructured data
At Next ‘22, we announced the preview of object tables , a new table type in BigQuery that provides metadata for objects stored in Google Cloud Storage. Object tables are powered by BigLake, and serve as the fundamental infrastructure to bring structured and unstructured data under a single management framework. You can now build machine learning models to drive business insights from data in all shapes of forms, without the need to move your data.
Run SQL to get insights from images
With easy access to the unstructured data, you can now write SQL on images, and predict results from machine learning models using BigQuery ML. You can import either state of the art TensorFlow Vision models (e.g. ImageNet and ResNet 50) or your own models to detect objects, annotate photos, extract text from images, and much more. You can unify the results of image analysis with structured data (website traffic, sales order, etc.) to train machine learning models to generate insights for better business outcomes. Let’s look at how Adswerve and Twiddy were able to incorporate rental listing images in their analytics to generate search results that resonated the most with their users.
User Story from Adswerve & Twiddy
Adswerve is a leading Google Marketing, Analytics and Cloud partner on a mission to humanize data. Twiddy & Co. is Adswerve’s client – a vacation rental company in North Carolina, dedicated to creating exceptional customer experiences by helping them to find dream vacation homes.
“As a local family vacation rental business specializing in delivering southern hospitality for nearly 45 years, we’ve always strived for our vacation home images to convey the unique local experience that our homes offer. BigQuery ML made it really easy for our business analysts to figure out just the right image creatively by analyzing thousands of potential options and combining them with existing click-through data. This, otherwise, would have taken a lot longer or simply we wouldn’t have done it at all.” —Shelley Tolbert, Director of Marketing, Twiddy & Company
To further improve their customer search experience on the website, they are faced by three main challenges:
- Relying on structured data (e.g. location, size) only to predict what customers might like
- The editorial team uses a manual photo selection process
- Require data science resources to build machine learning pipelines and processing data to resize images is labor intensive
They wanted to build machine learning models using both website search data and rental listing images to predict the click-through rate of the rental properties. Here is how they accomplished this using BigQuery ML with the new capabilities of Object Table.
Step 1: Access to image data by creating an object table
Step 2: Create image embeddings by importing a TensorFlow image model
Step 3: Train a wide and deep BigQuery ML supported model using both image and website data, and predict the click-through rate of rental properties
The results inferred that users are more likely to click on images that had water or other scenic properties. With these insights, Twiddy’s editorial team now makes a more data-driven approach for image selection and editing. This can all be done using SQL, which aligns with their existing analyst skills without having to recruit more specialized data scientists. Watch this demo from Adswerve to learn more.
Expanding unstructured data analytics leveraging Cloud AI services
Beyond using your own or public machine learning models to analyze unstructured data, we are bringing the Cloud AI services including Translation AI, Vision AI, Natural Language AI, and many others right inside BigQuery. You can translate text, detect objects from photos, perform sentiment analysis on user feedback, and much more all in SQL. You can then incorporate the results into your machine learning models for further analysis.
The YouVersion Bible App has been installed on more than half a billion unique devices. It offers Bible text in more than 1,800 languages and supports search in 103 languages. At the start of the geo-political issues in Ukraine, the search volume in Ukrainian nearly doubled. The team wanted to understand what people were searching for and make sure the search results were providing content that would bring people hope and peace. However, without an auto-translate feature, the team had to manually copy and paste each search term into Google translate dozens of times per day for weeks, which was very time-consuming.
With translation capabilities using BigQuery ML, YouVersion will be able to easily learn what users are searching for in the app moving forward. The team will be able to quickly fine-tune search results and generate content that is relevant to their users. This aligns with YouVersion’s desire to serve its global community well by helping the team remove language barriers between them and the people they serve. Watch this demo from YouVersion to learn more.
We will continue to expand these capabilities for different unstructured data types including documents, audios, videos, etc. in the near future. Submit this form to try these new capabilities that unlock the power of your unstructured data in BigQuery using BigQuery ML. You can find other BigQuery ML capabilities announced at Google Cloud Next.
By: Candice Chen (Product Manager, BigQuery ML) and Amir Hormati (Senior Software Engineering Manager, BigQuery)
Source: Google Cloud Blog