Daily HubSpot → BigQuery  Syncs

Automated daily syncs from Hubspot into BigQuery without expensive ETL costs, using an alternative to Google Integration Connector.

Google provides a HubSpot connector for its Google Cloud Integration Connectors product. It looks great in theory, but in practice, it’s extremely overpriced. It can cost hundreds of dollars a month to automatically sync Hubspot data into a BigQuery data warehouse. There are ways to do it at zero cost, or if you choose to cloud host your tools, at a much lower, controllable monthly cost.

BigQuery Hubspot Schema

Google Integration Connector

I don’t recommend using Google’s Integration Connector. It’s a popular approach I see mentioned, but it has some issues. It’s very expensive to work with any of Google’s Integration Connectors. The few times I’ve used them, we discovered unexpected bills that couldn’t be clearly explained. Getting cost estimates upfront is also elusive, which makes it not viable as a business option in my opinion.

Airbyte

Airbyte is an open-source data integration platform designed to synchronize data from various sources to data warehouses, lakes, and databases. It offers customizable connectors and supports real-time data synchronization, ensuring data consistency across systems. It’s highly extensible, allowing companies to build and maintain their connectors. It also offers hundreds of pre-configured ETL processes, one of which is for using Hubspot as a data source.

Airbyte Hubspot BigQuery Sync History

To use the following method, you’ll need to have a running installation of Airbyte. This can be done with an extra machine you have in the office, as long as it can run around the clock. I use a separate System76 machine that runs 24/7. You can also set up a cloud-hosted Airbyte, which will incur costs but will be fixed, predictable, and much lower than Google’s offering.

Reference the official documentation for details on installing Airbyte.

Hubspot Source

Airbyte sources are the systems where your data originates. They can be databases, APIs, files, or other data sources.

BigQuery Destination

In Airbyte, destinations are the systems where you want to send your data. They could be data warehouses, data lakes, databases, or analytics tools.

Sync Connection

Connections create ETL pipelines that connect a source to a destination. It defines how data is transferred, including the schedule, specific data selection, and possible transformation. In many cases, the transformation needs to be added, but the option exists for custom scenarios that need scrubbing or modifying before being stored in a destination.

Daily Sync from Hubspot to BigQuery

Finally, the approach I’ve used many times to sync data from Hubspot into a BigQuery data warehouse.

Hubspot Source Setup

To set up Hubspot as a source, you’ll need a Super Admin level Hubspot user to create a private app or work with someone who does to create one and provide you with its access token.

Airbyte Hubspot Source

Once you have the private app set up in Hubspot, it’s easy to configure the source in Airbyte:

  1. Select “Private App” as the authentication type.
  2. Enter your private app’s access token.
  3. Click “Test and save.”

The source will verify the connection, and once that completes, you’re done.

BigQuery Destination Setup

Setting up a BigQuery destination is a bit more detailed, and you’ll need admin access to your Google Cloud project. Reference the Airbyte documentation on BigQuery for more detailed instructions, but I’ll describe the basics.

Airbyte BigQuery Destination Configuration

First, you’ll need to set up a few things in Google Cloud.

  1. In the Google Cloud “APIs & Services” area, create credentials to generate a new service account.
  2. Add the following roles to the account: BigQuery Admin, Editor, Storage Admin, Storage Object Admin.
  3. Create a key for that service account and download the JSON file provided.
  4. Create a new Cloud Storage bucket for storing temporary files used during export/import processes: hubspot.
  5. Add a folder in that bucket, where you’ll store the CSVs used during transfers: transfers.
  6. Create HMAC access keys for Cloud Storage under “Settings > INTEROPERABILITY” for the service account you created.

Once you have these things documented and downloaded, you can set up the destination in Airbyte, providing your:

  1. Project ID: {BIGQUERY-PROJECT-ID}
  2. Dataset Location: US
  3. Default Dataset ID: hubspot
  4. Loading Method: GCS Staging
  5. HMAC Key Access ID: {HMAC-KEY}
  6. HMAC Key Secret: {HMAC-SECRET}
  7. GCS Bucket Name: airbyte-bigquery
  8. GCS Bucket Path: syncdata
  9. Service Account Key JSON: Open the JSON key you downloaded and copy/paste the entire thing here.
  10. Transformation Query Run Type: batch
  11. Google BigQuery Client Chunk Size: 15 (MB)
  12. Raw Table Dataset Name: airbyte

After entering these, click the “Test and save” button and wait for the setup tests to complete. If you encounter errors during this step, you’ll need to work through them.

Helpful Tip: Airbyte uses the same interface to connect with S3 and Google Cloud, so if you see errors mentioning AWS or S3, those are actually for Google Cloud Storage.

HubSpot → BigQuery Connection

Now that you have a Hubspot source and a BigQuery destination set up, you can set up a sync process as a connection. Follow the initial steps to select Hubspot as the source and BigQuery as the destination. Once those complete the verification/loading process, you’ll want to configure a few things specific to your situation.

Streams can be selected to choose the specific Hubspot resource types you want to sync, such as contacts, companies, forms, and form_submissions. Depending on what you choose, you’ll need to make sure your private app has the proper permissions available.

Airbyte Hubspot Sync Tables

Schemas are available for each resource/table you’ve enabled, and you need to click into them and choose only the specific properties you want created as columns in BigQuery. This is very important. Don’t leave all of them selected, or you’ll have over 500 columns and very bloated storage in BigQuery.

Airbyte Hubspot Schema Selector

Schedule when and how often you want to run a sync. I recommend a scheduled batch every 24 hours for Hubspot.

Airbyte Hubspot Sync Connection Schedule

Once you’ve completed this, you can run your first sync. When it completes, you’ll have the data available in BigQuery.

BigQuery Hubspot Schema

Daily HubSpot → BigQuery ETL

You now have a reliable method for running daily syncs of your Hubspot data into BigQuery without expensive ETL costs. Having Hubspot data in BigQuery gives you the ability to run in-depth analyses, creating performance dashboards in visualization tools like Looker or Tableau. It’s also very useful for measuring marketing campaign performance with attribution and can be used to create predictive models.

Summary

This alternative method to sync HubSpot data to BigQuery allows you to handle ETL yourself without relying on Google’s expensive Integration Connector. Airbyte allows you to set up scheduled and real-time data syncs between hundreds of sources and destinations at a significantly lower cost. The method described here can be used for many other data sync connections to and from BigQuery. It’s incredibly beneficial in many ways. I highly recommend it for businesses looking to set up a BigQuery data warehouse while keeping costs manageable. If you need any help setting up Airbyte or handling complex ETL to and from BigQuery, I’ve been a BigQuery consultant for over six years and have set up hundreds of ETL processes to support business analytics and intelligence.

Citations & References