Unlocking Retail Location Data With CARTO And BigQuery

Retail companies put geospatial data to use to solve all manner of challenges. Data that is tied to location is vital to understanding customers and solving business problems. Say you’re a Director of Leasing who needs to choose the location for a new store. You’ll need to know your potential customers’ demographics, the products being sold by competitors, foot-traffic patterns — but all that data would be essentially useless if it wasn’t tied to a spatial location.

Adding the spatial dimension to your data unlocks new potential, but also adds a degree of complexity. Geospatial data requires map-based visualizations, unique functions and procedures, and far more storage space than your average data. This is why Location Intelligence platforms like CARTO, and peta-byte scale data warehouses like BigQuery are an essential part of a business that wants to use geospatial data to solve their business problems.

CARTO for Retail is a set of platform components and analytical functions that have been developed to assist Retail companies with their geospatial data. The CARTO for Retail functions are deployed directly in BigQuery, and combine spatial analytics with BigQuery Machine Learning tools to run predictions and analysis in the same location as your data. The CARTO for Retail Reference Guide goes into extensive detail about this solution, which we’ll dive into below.

CARTO Platform Components

The CARTO platform provides a set of capabilities that, when combined with the processing power of BigQuery, form the CARTO for Retail Solution. Below is an illustration of the components in play:


The CARTO Builder is a web-based drag-and-drop analysis tool that allows you to quickly see your geospatial data on a map. You can do discovery analyses with the built-in spatial functions, which push the processing down to BigQuery without any configuration required on your end. If you do want to put your hands on the commands that are sent to BigQuery, you can open the SQL interface and edit the code directly. This makes the CARTO Builder an excellent tool for rapid-prototyping geospatial applications.

Once you’re ready to add advanced application features like fine-grained access controls or custom filtering, you can add to your prototype’s code using the deck.gl framework (this is what CARTO uses on the backend) and CARTO for React. They also provide some helpful templates at that link to get you started.


While most companies generate some of their own geospatial data, very few have the means (think hundreds of thousands of drones in the sky on a daily basis) to generate the full picture. How about adding some location data to your location data? CARTO’s Data Observatory provides curated 3rd party data (some free, most for purchase) including socio-demographics, Points of Interest, foot & road traffic, behavioral data or credit card transactions. All the data is already hosted in BigQuery, so it’s easy to merge with your existing data. BigQuery itself also has a number of publicly available geospatial datasets, including OpenStreetMap.


There are a series of retail-specific User Defined Functions and stored procedures within the CARTO Analytics Toolbox. These procedures, called the CARTO’s Analytics Toolbox for BigQuery can be accessed through the CARTO platform, or directly in the BigQuery console. Leveraging the massive computing power of BigQuery, you can do the following analyses:

  • Clustering
    Analysis to Identify the optimal store locations by geographically clustering customers, competitors and existing stores.
  • Commercial Hotspots
    Models to focus on the most effective areas for expansion, based on the surrounding retail fabric.
  • Whitespace Analysis
    Routines to identify the best potential locations, where expected revenue is higher than  top performing stores, and where key business criteria are met. 
  • Twin Areas Analysis
    ML-driven analytics to focus network expansion strategies on the most similar locations to the best performing stores.
  • Store Revenue Prediction
    A trained Machine Learning model to predict the annual revenue of a planned store location.
  • Store Cannibalization
    A model to estimate the overlap of areas and spatial features that a new store would have with the existing store network.


Now let’s see the CARTO for Retail components in action. Our goal for this example is to identify similar areas (known as twin areas) in Texas to match a particular high-performing store in Austin. We first create a connection to BigQuery using a service account.

Next, we need to create our data model using the carto.BUILD_REVENUE_MODEL_DATA function. This function takes in stores, revenue data, and competitors, then creates an evaluation grid to find twin areas, trade areas (which can be any polygon such as a radius, drive time, or custom created polygon), and desired enrichment variables. Below is an example of this function:

  -- Stores: revenue, store, geom
  SELECT id as store, revenue, geom FROM project.dataset.stores_table
  -- Stores information variables - optional
  -- Competitors: competitor, geom
  SELECT id as competitor, geom FROM project.dataset.competitors_table
  -- Area of interest: geom
  SELECT state_geom as geom FROM `bigquery-public-data.geo_us_boundaries.states` WHERE state_name IN ('Texas', 'Arizona')
  -- Grid params: grid type and level
  'h3', 7,
  -- Decay params: kring size and decay function
  1, '',
  -- Data Observatory enrichment
  [('total_pop_3200daaa', 'sum'), ('households_3cda72a3', 'sum'), ('median_age_fb9fb9a', 'sum'), ('pop_25_years_ov_3eb5c867', 'sum'), ('median_income_20186485', 'avg'), ('income_200000_o_6cda1f8a', 'sum'), ('median_rent_37636cdd', 'sum'), ('families_with_y_228e5b1c', 'sum'), ('employed_pop_7fe50b6c', 'sum')],
  -- Custom data enrichment
  -- Output destination prefix

Next, we need to build out the revenue model using carto.BUILD_REVENUE_MODEL. This uses BigQuery ML to perform the model predictions and supports LINEAR_REG and BOOSTED_TREE_REGRESSOR. Check the model documentation for more information.

This will output the model, SHAP values, and model statistics to understand the model performance. Below is an example query to run this:

CALL `carto-un`.carto.BUILD_REVENUE_MODEL(
   -- Model data
   -- Options
   '{"MAX_ITERATIONS": 20}',
   -- Output destination prefix

Finally, we can predict our twin areas. We pick a target index which we can identify from our map. As we can see here, this cell is the top performing store we want to find similar areas to.

From here we can run our Twin Areas model, which is based on Principal Component Analysis (PCA). We provide a query containing our target H3 cell, a second query of the cells we want to target to study (any cell without a store in Texas), and several other arguments to fine tune our results:

CALL `carto-un`.carto.FIND_TWIN_AREAS(
   -- Input queries
   '''SELECT * FROM `project.dataset.retail_model_texas` WHERE index = '87489e262ffffff' ''',
   '''SELECT * FROM `project.dataset.forrest.retail_model_texas` WHERE revenue_avg = 0''',
   -- Twin areas model inputs

The result is this interactive map which shows us the top areas that will likely perform the most similar to our target store based on our geospatial factors. We can also include other store factors in our first step to add site specific details like square footage or year built.

What’s Next

Believe it or not, there are even more tools and functions that can help you make the most of your geospatial data, which are explored in the CARTO for Retail Reference Guide. There’s the Bigquery Tiler, and CARTO’s out-of-the-box Site Selection Application, which includes relevant 3rd party data, advanced map visualizations and embedded models to pinpoint the best locations for network expansion.

In addition to the CARTO Analytics Toolbox, BigQuery also has many additional GIS functions for analyzing your geospatial data. Check out this blog on optimizing your spatial storage clustering in BigQuery. If you’re looking to analyze raster data, or can’t find the dataset you need in CARTO’s data observatory, consider trying Google Earth Engine.

By: Remy Welch (Data Engineer)
Originally published at Google Cloud Blog

Source: Cyberpogo

Previous Article

A New Smartphone Operating System That Puts Privacy First

Next Article

Reimagining Telecommunications At MWC 2023

Related Posts