Google Search Console helps you understand how your website is doing in Google searches. It organizes data about things like how many people saw your site, clicked on it, and what search terms they used. By using this data in BigQuery, you can easily look at patterns, create special reports, and mix this data with other information to get a full picture of your website’s performance metrics.
Known as Bulk Data Exports according to Google’s product lingo, there is a built-in way to connect your search console profile to BigQuery so that it will automatically sync daily data that you ca query directly.
To set this up you’ll need to enable the BigQuery API in your Google Cloud account, then configure a Bulk data export inside Search Console’s settings.
In Google Cloud go to the “Enable Data Transfer API” and enable the API.
After that, go to IAM in Google Cloud and provide search console with permissions to connect your BigQuery project by adding a new principle user for [email protected]
with the following permissions:
In your search console settings enter your BigQuery export destination details:
bigquery-project-id
searchconsole
United State (US)
Once you’ve done this click continue, then create data export to complete the process.
You won’t see your data appear for about 48 hours though, and if you look immediately you’ll seed some odd filler information in a temp_*
table. Don’t worry, once data is synced within the initial time frame that will all be replaced.
Once your data has been imported into the BigQuery searchconsole
dataset you’ll have the following tables and columns available for querying.
searchdata_site_impression
Daily aggregated data of impressions and related details for your entire site as a whole. Each row represents the total daily values for a given property (or website profile).
SUM(sum_top_position)/SUM(impressions) + 1
.searchdata_url_impression
Daily aggregated data of impressions and related details for individual URL in your website. Each row represents a day of data for a specific, unique URL on your website.
data_date
– The day on which the data in this row was generated (Pacific Time).site_url
– URL of the property. For domain-level properties, this will be sc-domain:property-name. For URL-prefix properties, it will be the full URL of the property definition. Examples: sc-domain:developers.google.com, https://developers.google.com/webmaster-tools/url
– The fully-qualified URL where the user eventually lands when they click the search result or Discover story.query
– Same as above.is_anonymized_query
– Same as aboveis_anonymized_discover
– Whether the data row is under the Discover anonymization threshold. When under the threshold, some other fields (like URL and country) will be missing to protect user privacy.country
– Same as above.search_type
– Same as above.device
– Same as above.is_[search_appearance_type]
– There are several boolean fields used to indicate search appearance type, such as is_amp_top_stories, is_job_listing, and is_job_details. A field will be true if the row in question appears for the specific rich result.impressions
– Same as above.clicks
– Same as above.sum_position
– A zero-based number indicating the topmost position of this URL in the search results for the query. (Zero is the top position in the results.) To calculate average position (which is 1-based), calculate SUM(sum_position)/SUM(impressions) + 1.ExportLog
Records about the latest successful sync that was imported, containing the following fields:
agenda
– Which type of data was saved. Currently the only value is SEARCHDATA
.namespace
– Which table was saved to. Search Console typically exports data separately for each table, so each day typically has two export rows.data_date
– Date of the click or impression, in Pacific time. This is the partition date used by the tables.epoch_version
– An integer, where 0 is the first time data was saved to this table for this date. In the rare case that Search Console needs to go back and adjust previous data, perhaps because of a data anomaly, this value will be incremented by 1 each time the data is updated for that table and date.publish_time
– When the export completed in PST timezoneOne import note is that failed exports won’t ever be recorded in the ExportLog
table.
More details on the schema is provided in Google’s Help Center.
The following queries may be useful for exploring the data you now have access to in BigQuery. The real power comes from combining the information with other data sources though, these are really just meant to provide a basic overview of what you have available for querying.
Provides a list of all organic search keywords ranked by impression count, and ordered by search date.
SELECT
data_date as search_date,
IF(is_anonymized_query, 'longtail (anonymized)', query) as search_term,
SUM(impressions) as impressions,
SUM(clicks) as clicks
FROM `bigquery_project.searchconsole.searchdata_site_impression`
GROUP BY 1, 2
ORDER BY data_date DESC, impressions DESC
Provides a list of all landing page URLs in a site that are ranking organically on Google, including the search keywords they raked for, their impression and click counts. The results are ordered rom recent to oldest by search date.
SELECT
data_date as search_date,
url,
IF(is_anonymized_query, 'longtail (anonymized)', query) as search_term,
SUM(impressions) as impressions,
SUM(clicks) as clicks
FROM `bigquery_project.searchconsole.searchdata_url_impression`
GROUP BY 1, 2, 3
ORDER BY data_date DESC, impressions DESC
Provides a list of organically ranking landing page URLs ordered by SERP ranking beginning at zero as the top position, then ordered next by impressions in descending order.
SELECT
url,
IF(is_anonymized_query, 'longtail (anonymized)', query) as search_term,
SUM(impressions) as impressions,
MIN(sum_position) as serp_position
FROM `bigquery_project.searchconsole.searchdata_url_impression`
WHERE is_anonymized_query = false
GROUP BY 1, 2
ORDER BY serp_position ASC, impressions DESC
A list of the top 100 landing pages ordered by clicks and then impressions in descending order or the last 30 days.
SELECT
url,
query as search_term,
SUM(clicks) as clicks,
SUM(impressions) as impressions,
MIN(sum_position) as serp_position
FROM `bigquery_project.searchconsole.searchdata_url_impression`
WHERE data_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND is_anonymized_query = false
GROUP BY url, search_term
ORDER BY clicks DESC, impressions DESC
LIMIT 100
SELECT
query as search_term,
SUM(clicks) as clicks,
SUM(impressions) as impressions,
MIN(sum_position) as serp_position
FROM `bigquery_project.searchconsole.searchdata_url_impression`
WHERE data_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
AND is_anonymized_query = false
GROUP BY url, search_term
HAVING serp_position <= 10 AND impressions > 0
ORDER BY serp_position ASC, impressions DESC
Many of these things can be gathered manually using the search console UI, so you may be wondering why bother connecting to BigQuery. Having your search console data available in BigQuery provides a few critical benefits, and allows you to do precise things you would otherwise be unable to.
More information is provided by Google, but these are the highlights from my perspective. If you’re a business that is focused on search performance to achieve results, then connecting your Google Search Console profile to BigQuery with their bulk data export is something I would highly recommend exploring.