Tracking where your traffic comes from in Google Analytics 4 (GA4) isn’t as simple as just looking at a report—it’s based on a priority system that decides what gets credit for a visit.
If UTM parameters are present, they win. If not, GA4 looks at the referrer, then Google Ads auto-tagging, and so on. In this post, I’ll break down exactly how GA4 assigns traffic sources in BigQuery, how different levels of attribution work, and how you can get the cleanest, most reliable source data possible. If you want to improve how you track paid clicks from platforms like TikTok, Meta, and LinkedIn, or just get a better handle on what’s actually driving conversions, this will help.
Google Analytics traffic source is set based on these, in order of priority:
For those who are familiar with digital attribution, this is what GA4 provides for attribution related data in BigQuery:
collected_traffic_source
– Direct attribution when an event occurs on a page with UTM or Google Click IDssession_traffic_source_last_click
– Last touch attribution for the specific session when an event occurredtraffic_source
– First touch attributionMore details on each are below, but in a nutshell that’s what is provided out of the box. There are ways to track and handle your own attribution details, including additional click IDs for other paid clicks from TikTok, Meta, LinkedIn, Reddit, Snapchat and anywhere else passing a URL parameter that identifies a click. Out of the box, this is what you get.
Direct attribution is when a conversion event directly contains URL parameters that identify the source of traffic. UTM’s are the most common source of this data, and when a URL has them properly added they’ll pass into Google Analytics 4 and subsequently BigQuery when the two are connected.
UTM and Google Click ID parameters passed to a URL are collected in the collected_traffic_source
column within BigQuery’s GA4 dataset.
This column is a RECORD
in the events_*
table, traffic source information captured directly when a specific event occurred. When it’s available in BigQuery, it means that UTM’s or other attribution data was found only when that specific event occurred.
As a real work example, let’s say someone clicks a Google Ad for your brand/organization, bringing them to a specific landing page in your website. If you were to query pageview events to that landing page, you would see a collected_traffic_source.gclid
value containing the Google Click ID parameter value passed from Google Ads through the URL like /landing-page/?gclid=ABC123
. If that person clicked a link to your registration or onboarding form, a pageview would exist for that visit without any collected_traffic_source
data.
Another way to think of it is that collected_traffic_source
means attribution data was passed directly through the URL as parameters for the URL where an event occurred.
Now let’s say someone clicked the link from another Google Ad, bringing them directly to your registration or onboarding form. The pageview event in BigQuery for that visit would contain the gclid URL parameter in the collected_traffic_source.gclid
column. If that person successfully completes the onboarding/signup form, it would send a form_submitted
event to GA4 that also contained the same collected_traffic_source.gclid
value.
Collected traffic source data captures a range of things, here are the specific columns included under the collected_traffic_source
RECORD. Again, all of these are captured from the URL directly as URL parameters when the event occurred. This can be for a pageview, or any event that occurs with a URL parameter in the current address.
manual_campaign_id
– utm_idmanual_campaign_name
– utm_campaignmanual_source
– utm_sourcemanual_medium
– utm_mediummanual_term
– utm_termmanual_content
– utm_contentmanual_creative_format
– utm_creative_formatmanual_marketing_tactic
– utm_marketing_tacticmanual_source_platform
– utm_source_platformgclid
– Google Ads click ID: gclid, gbraid or wbraiddclid
– DoubleClick click ID for display, video 360 and campaign manager 360 adssrsltid
– Google merchant center click IDWhen UTM’s aren’t found, Google won’t provide you with much, but that doesn’t mean there isn’t useful attribution related information in the URL. I always recommend parsing the URL for specific known parameters used by advertisers, typically using regular expressions or string contains checks to look for Click ID patterns.
A few examples include checking if a lowercase URL contains or matches any of the following parameters:
fbclid
it’s from Facebookttclid
it’s from TikToksccid
it’s from SnapchatAnd so on… The list would match all advertising providers sending traffic to a site with paid clicks.
Google Analytics 4 also provides attribution details for a given session in the session_traffic_source_last_click
column. This RECORD
type column contains info about the most recent attributed session traffic source for Google Ads and manual contexts (UTM’s) when it exists.
The session_traffic_source_last_click
RECORD captures attribution data for the session in which an event occurred. Unlike collected_traffic_source
, which reflects direct attribution at the event level, session-level attribution assigns traffic source details based on the last known touchpoint within the session.
This means that if a user arrives at your site via an organic search result, browses multiple pages, and then completes a conversion, all events within that session will inherit the same session_traffic_source_last_click
values. If the user navigates away and returns via a different source in a new session, the session-level attribution will update accordingly.
The RECORD
can potentially contain a lot of different properties depending on the Google products you use to serve ads. The most useful and commonly used ones are outlined below, but for more details visit the GA4 BigQuery Export schema
documentation provided directly by Google.
When a session is attributed based on URL parameters passed directly through the URL as utm_*
, the details are stored under session_traffic_source_last_click.manual_campaign.*
:
session_traffic_source_last_click.manual_campaign.campaign_id
– The ID of the last clicked manual campaignsession_traffic_source_last_click.manual_campaign.campaign_name
– The name of the last clicked manual campaignsession_traffic_source_last_click.manual_campaign.medium
– The medium of the last clicked manual campaign (e.g., paid search, organic search, email)session_traffic_source_last_click.manual_campaign.term
– The keyword/search term of the last clicked manual campaignsession_traffic_source_last_click.manual_campaign.content
– Additional metadata of the last clicked manual campaignsession_traffic_source_last_click.manual_campaign.source_platform
– The platform of the last clicked manual campaign (e.g., search engine, social media)session_traffic_source_last_click.manual_campaign.source
– The specific source within the platform of the last clicked manual campaignsession_traffic_source_last_click.manual_campaign.creative_format
– The format of the creative of the last clicked manual campaignsession_traffic_source_last_click.manual_campaign.marketing_tactic
– The marketing tactic of the last clicked manual campaign.If a session originates from a Google Ads campaign, additional details are provided as session_traffic_source_last_click.google_ads_campaign.*
parameters:
session_traffic_source_last_click.google_ads_campaign.customer_id
– The customer ID associated with the Google Ads accountsession_traffic_source_last_click.google_ads_campaign.account_name
– The name of the Google Ads accountsession_traffic_source_last_click.google_ads_campaign.campaign_id
– The ID of the Google Ads campaignsession_traffic_source_last_click.google_ads_campaign.campaign_name
– The name of the Google Ads campaignsession_traffic_source_last_click.google_ads_campaign.ad_group_id
– The ID of the ad group within the Google Ads campaignsession_traffic_source_last_click.google_ads_campaign.ad_group_name
– The name of the ad group within the Google Ads campaignThe traffic_source
RECORD contains information about the traffic source that first acquired the user. This record is not populated in intraday tables. These traffic_source values do not change if the user interacts with subsequent campaigns after installation, so they’re effectively a first-touch attribution source.
traffic_source.name
– Name of the marketing campaign that first acquired the user. This field is not populated in intraday tables.traffic_source.medium
– Name of the medium (paid search, organic search, email, etc.) that first acquired the user. This field is not populated in intraday tables.traffic_source.source
– Name of the network that first acquired the user. This field is not populated in intraday tables.Google attempts to provide last-touch attribution in an event’s event_params
RECORD, but it’s not all that accurate in practice. I’ve found it much more accurate to augment this approach, but it’s worth mentioning here as it is available and will likely be something that comes up.
event_params.medium
event_params.campaign
event_params.source
These are available for a given event when collected.
One useful parameter is the page_referrer
property, which can be used for manual attribution rules in useful ways.
For example, a JS function can be added in BigQuery that will return a lead source based on the value of the referrer when it’s available. This function could have logic like this:
And so on… It’s usually the most useful for third-party referrers that may mention the brand either through sponsored content or organically as an incoming link.
In practice, I’ll typically combine all three of these attribution levels to provide an ideal lead source like this:
collected_traffic_source
has the next highest priority, when it’s there it’s used as the sourcesession_traffic_source_last_click
takes the next priority level and is used when no directly collected UTM’s or click IDs are foundevent_params.medium
and event_params.source
are checked and utilized if availableevent_params.page_referrer
checks for specific known incoming links and third-party traffic sourcestraffic_source
is used as a final fallbackUsing these checks in order results in a pretty rock solid approach to handling attribution with Google Analytics 4 data in BigQuery. The approach is based on years of work within the two, and it’s been tested and approved by many clients I’ve worked with.