Google provides a HubSpot connector for its Google Cloud Integration Connectors product. It looks great in theory, but in practice, it’s extremely overpriced. It can cost hundreds of dollars a month to automatically sync Hubspot data into a BigQuery data warehouse. There are ways to do it at zero cost, or if you choose to cloud host your tools, at a much lower, controllable monthly cost.
I don’t recommend using Google’s Integration Connector. It’s a popular approach I see mentioned, but it has some issues. It’s very expensive to work with any of Google’s Integration Connectors. The few times I’ve used them, we discovered unexpected bills that couldn’t be clearly explained. Getting cost estimates upfront is also elusive, which makes it not viable as a business option in my opinion.
Airbyte is an open-source data integration platform designed to synchronize data from various sources to data warehouses, lakes, and databases. It offers customizable connectors and supports real-time data synchronization, ensuring data consistency across systems. It’s highly extensible, allowing companies to build and maintain their connectors. It also offers hundreds of pre-configured ETL processes, one of which is for using Hubspot as a data source.
To use the following method, you’ll need to have a running installation of Airbyte. This can be done with an extra machine you have in the office, as long as it can run around the clock. I use a separate System76 machine that runs 24/7. You can also set up a cloud-hosted Airbyte, which will incur costs but will be fixed, predictable, and much lower than Google’s offering.
Reference the official documentation for details on installing Airbyte.
Airbyte sources are the systems where your data originates. They can be databases, APIs, files, or other data sources.
In Airbyte, destinations are the systems where you want to send your data. They could be data warehouses, data lakes, databases, or analytics tools.
Connections create ETL pipelines that connect a source to a destination. It defines how data is transferred, including the schedule, specific data selection, and possible transformation. In many cases, the transformation needs to be added, but the option exists for custom scenarios that need scrubbing or modifying before being stored in a destination.
Finally, the approach I’ve used many times to sync data from Hubspot into a BigQuery data warehouse.
To set up Hubspot as a source, you’ll need a Super Admin level Hubspot user to create a private app or work with someone who does to create one and provide you with its access token.
Once you have the private app set up in Hubspot, it’s easy to configure the source in Airbyte:
The source will verify the connection, and once that completes, you’re done.
Setting up a BigQuery destination is a bit more detailed, and you’ll need admin access to your Google Cloud project. Reference the Airbyte documentation on BigQuery for more detailed instructions, but I’ll describe the basics.
First, you’ll need to set up a few things in Google Cloud.
BigQuery Admin
, Editor
, Storage Admin
, Storage Object Admin
.hubspot
.transfers
.Once you have these things documented and downloaded, you can set up the destination in Airbyte, providing your:
{BIGQUERY-PROJECT-ID}
US
hubspot
GCS Staging
{HMAC-KEY}
{HMAC-SECRET}
airbyte-bigquery
syncdata
batch
15
(MB)airbyte
After entering these, click the “Test and save” button and wait for the setup tests to complete. If you encounter errors during this step, you’ll need to work through them.
Helpful Tip: Airbyte uses the same interface to connect with S3 and Google Cloud, so if you see errors mentioning AWS or S3, those are actually for Google Cloud Storage.
Now that you have a Hubspot source and a BigQuery destination set up, you can set up a sync process as a connection. Follow the initial steps to select Hubspot as the source and BigQuery as the destination. Once those complete the verification/loading process, you’ll want to configure a few things specific to your situation.
Streams can be selected to choose the specific Hubspot resource types you want to sync, such as contacts
, companies
, forms
, and form_submissions
. Depending on what you choose, you’ll need to make sure your private app has the proper permissions available.
Schemas are available for each resource/table you’ve enabled, and you need to click into them and choose only the specific properties you want created as columns in BigQuery. This is very important. Don’t leave all of them selected, or you’ll have over 500 columns and very bloated storage in BigQuery.
Schedule when and how often you want to run a sync. I recommend a scheduled batch every 24 hours for Hubspot.
Once you’ve completed this, you can run your first sync. When it completes, you’ll have the data available in BigQuery.
You now have a reliable method for running daily syncs of your Hubspot data into BigQuery without expensive ETL costs. Having Hubspot data in BigQuery gives you the ability to run in-depth analyses, creating performance dashboards in visualization tools like Looker or Tableau. It’s also very useful for measuring marketing campaign performance with attribution and can be used to create predictive models.
This alternative method to sync HubSpot data to BigQuery allows you to handle ETL yourself without relying on Google’s expensive Integration Connector. Airbyte allows you to set up scheduled and real-time data syncs between hundreds of sources and destinations at a significantly lower cost. The method described here can be used for many other data sync connections to and from BigQuery. It’s incredibly beneficial in many ways. I highly recommend it for businesses looking to set up a BigQuery data warehouse while keeping costs manageable. If you need any help setting up Airbyte or handling complex ETL to and from BigQuery, I’ve been a BigQuery consultant for over six years and have set up hundreds of ETL processes to support business analytics and intelligence.