Google Analytics Attribution & Traffic  Source

GA4 assigns traffic sources based on a priority system—UTMs, referrers, auto-tagging, and more—so here’s how to decode it in BigQuery and get the cleanest attribution data possible.

Tracking where your traffic comes from in Google Analytics 4 (GA4) isn’t as simple as just looking at a report—it’s based on a priority system that decides what gets credit for a visit.

If UTM parameters are present, they win. If not, GA4 looks at the referrer, then Google Ads auto-tagging, and so on. In this post, I’ll break down exactly how GA4 assigns traffic sources in BigQuery, how different levels of attribution work, and how you can get the cleanest, most reliable source data possible. If you want to improve how you track paid clicks from platforms like TikTok, Meta, and LinkedIn, or just get a better handle on what’s actually driving conversions, this will help.

Google Analytics traffic source is set based on these, in order of priority:

  1. UTM Parameters directly set in the URL
  2. Referrer Data (auto-detection for untracked links)
  3. Google Ads Auto-Tagging (Paid Search)
  4. Direct Traffic (No Referrer, No UTM)
  5. Enhanced Attribution – GA4 uses machine learning and modeled data to infer traffic sources when referrers are missing (e.g., iOS privacy restrictions).

Quick Overview

For those who are familiar with digital attribution, this is what GA4 provides for attribution related data in BigQuery:

More details on each are below, but in a nutshell that’s what is provided out of the box. There are ways to track and handle your own attribution details, including additional click IDs for other paid clicks from TikTok, Meta, LinkedIn, Reddit, Snapchat and anywhere else passing a URL parameter that identifies a click. Out of the box, this is what you get.

Direct Attribution (Event Level)

Direct attribution is when a conversion event directly contains URL parameters that identify the source of traffic. UTM’s are the most common source of this data, and when a URL has them properly added they’ll pass into Google Analytics 4 and subsequently BigQuery when the two are connected.

collected_traffic_source

UTM and Google Click ID parameters passed to a URL are collected in the collected_traffic_source column within BigQuery’s GA4 dataset.

This column is a RECORD in the events_* table, traffic source information captured directly when a specific event occurred. When it’s available in BigQuery, it means that UTM’s or other attribution data was found only when that specific event occurred.

As a real work example, let’s say someone clicks a Google Ad for your brand/organization, bringing them to a specific landing page in your website. If you were to query pageview events to that landing page, you would see a collected_traffic_source.gclid value containing the Google Click ID parameter value passed from Google Ads through the URL like /landing-page/?gclid=ABC123. If that person clicked a link to your registration or onboarding form, a pageview would exist for that visit without any collected_traffic_source data.

Another way to think of it is that collected_traffic_source means attribution data was passed directly through the URL as parameters for the URL where an event occurred.

Now let’s say someone clicked the link from another Google Ad, bringing them directly to your registration or onboarding form. The pageview event in BigQuery for that visit would contain the gclid URL parameter in the collected_traffic_source.gclid column. If that person successfully completes the onboarding/signup form, it would send a form_submitted event to GA4 that also contained the same collected_traffic_source.gclid value.

Collected traffic source data captures a range of things, here are the specific columns included under the collected_traffic_source RECORD. Again, all of these are captured from the URL directly as URL parameters when the event occurred. This can be for a pageview, or any event that occurs with a URL parameter in the current address.

URL Parsing

When UTM’s aren’t found, Google won’t provide you with much, but that doesn’t mean there isn’t useful attribution related information in the URL. I always recommend parsing the URL for specific known parameters used by advertisers, typically using regular expressions or string contains checks to look for Click ID patterns.

A few examples include checking if a lowercase URL contains or matches any of the following parameters:

And so on… The list would match all advertising providers sending traffic to a site with paid clicks.

Session-level Attribution

Google Analytics 4 also provides attribution details for a given session in the session_traffic_source_last_click column. This RECORD type column contains info about the most recent attributed session traffic source for Google Ads and manual contexts (UTM’s) when it exists.

The session_traffic_source_last_click RECORD captures attribution data for the session in which an event occurred. Unlike collected_traffic_source, which reflects direct attribution at the event level, session-level attribution assigns traffic source details based on the last known touchpoint within the session.

This means that if a user arrives at your site via an organic search result, browses multiple pages, and then completes a conversion, all events within that session will inherit the same session_traffic_source_last_click values. If the user navigates away and returns via a different source in a new session, the session-level attribution will update accordingly.

The RECORD can potentially contain a lot of different properties depending on the Google products you use to serve ads. The most useful and commonly used ones are outlined below, but for more details visit the GA4 BigQuery Export schema
documentation provided directly by Google.

Manual Campaign Attribution Fields

When a session is attributed based on URL parameters passed directly through the URL as utm_*, the details are stored under session_traffic_source_last_click.manual_campaign.*:

If a session originates from a Google Ads campaign, additional details are provided as session_traffic_source_last_click.google_ads_campaign.* parameters:

First Touch Attribution

The traffic_source RECORD contains information about the traffic source that first acquired the user. This record is not populated in intraday tables. These traffic_source values do not change if the user interacts with subsequent campaigns after installation, so they’re effectively a first-touch attribution source.

Last Touch Attribution

Google attempts to provide last-touch attribution in an event’s event_params RECORD, but it’s not all that accurate in practice. I’ve found it much more accurate to augment this approach, but it’s worth mentioning here as it is available and will likely be something that comes up.

event_params Attribution Values

These are available for a given event when collected.

page_referrer as Referrer

One useful parameter is the page_referrer property, which can be used for manual attribution rules in useful ways.

For example, a JS function can be added in BigQuery that will return a lead source based on the value of the referrer when it’s available. This function could have logic like this:

And so on… It’s usually the most useful for third-party referrers that may mention the brand either through sponsored content or organically as an incoming link.

Combined Attribution

In practice, I’ll typically combine all three of these attribution levels to provide an ideal lead source like this:

  1. URL parsing has the highest priority when used, because a click ID is damn accurate
  2. collected_traffic_source has the next highest priority, when it’s there it’s used as the source
  3. session_traffic_source_last_click takes the next priority level and is used when no directly collected UTM’s or click IDs are found
  4. event_params.medium and event_params.source are checked and utilized if available
  5. event_params.page_referrer checks for specific known incoming links and third-party traffic sources
  6. traffic_source is used as a final fallback

Using these checks in order results in a pretty rock solid approach to handling attribution with Google Analytics 4 data in BigQuery. The approach is based on years of work within the two, and it’s been tested and approved by many clients I’ve worked with.