The GA4 update: matching big query data with Google analytics - Acquisition Channel report

With the widespread usage of GA4 these days, it seems only fitting to continue my series of matching BigQuery data with Google Analytics.

See my previous post for basic audience metrics.

In this post, I will look at :

  • transactions

  • eCommerce conversion rate

  • revenue

  • Goal completions

  • Goal conversion rate

Before we get into each individual metric, a little about channels in BQ. Firstly, BigQuery doesn’t store the channel groupings, only source, medium and campaign. Also, if you use trafficSource, this is the user traffic source, not the session. so to get the session channel groupings we need to work a bit of magic. I got mine from this site: https://www.ga4bigquery.com/traffic-source-dimensions-metrics-ga4/.

Get the Source/Mediums

It turns out that the source, medium and campaign listed under traffic_source is the first users’ traffic source. For you to get the session traffic source, you need to look at the parameters stored with each hit. For this, we do the following:

with acq_table as (
    select distinct
        date,
        user_pseudo_id,
        session_id,
        FIRST_VALUE(session_traffic_source.source ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.source is null then 2 else 1 end, event_timestamp) AS source,
        FIRST_VALUE(session_traffic_source.medium ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.medium is null then 2 else 1 end, event_timestamp) AS medium,
        FIRST_VALUE(session_traffic_source.campaign ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.campaign is null then 2 else 1 end, event_timestamp)  AS campaign,
        FIRST_VALUE(session_traffic_source.gclid ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.gclid is null then 2 else 1 end, event_timestamp) AS gclid,
    
    from( 
        Select * except(traffic_source),
        array_agg( if(coalesce(traffic_source.source, traffic_source.medium, traffic_source.campaign, traffic_source.gclid)  is not null,traffic_source, null) ignore nulls order by event_timestamp asc limit 1) [safe_offset(0)] as session_traffic_source,
        from(
            SELECT
                PARSE_DATE("%Y%m%d",event_date) AS date,
                user_pseudo_id,
                event_timestamp,
                (SELECT value.int_value FROM UNNEST(event_params) WHERE  key = 'ga_session_id') AS session_id,
                (select
                    as struct 
                    (select value.string_value from unnest(event_params) where key = 'source') as source,
                    (select value.string_value from unnest(event_params) where key = 'medium') as medium,
                    (select value.string_value from unnest(event_params) where key = 'campaign') as campaign,
                    collected_traffic_source.gclid as gclid

                ) as traffic_source,
                
            FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
                where _table_suffix BETWEEN "20231001" and "20231031"
                and event_name NOT IN ('first_visit','session_start'))
        group by 1,2,3,4)
)
Select
    case 
        when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct'
        when regexp_contains(campaign, 'cross-network') then 'Cross-network'
        when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
            or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$'))
            and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping'
        when (regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$')) or gclid is not null then 'Paid Search'
        when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social'
        when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video'
        when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display'
        when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
            or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping'
        when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
            or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social'
        when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
            or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video'
        when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
            or medium = 'organic' then 'Organic Search'
        when regexp_contains(source,'email|e-mail|e_mail|e mail')
            or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email'
        when medium = 'affiliate' then 'Affiliates'
        when medium = 'referral' then 'Referral'
        when medium = 'audio' then 'Audio'
        when medium = 'sms' then 'SMS'
        when medium like '%push'
            or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications'
        else 'Unassigned' end as channel_grouping_session,
        count(distinct concat(user_pseudo_id, session_id)) as sessions,

from acq_table
group by 1

Note: this won’t work in the sample data for GA4 because the GCLID field was only added later. This is required otherwise your paid search is grossly underreported.

Transactions and Revenue

This is done by counting transaction IDs for purchase events. We can also sum up the purchase revenue in this table. This gives us the number of purchases, conversion rate and total revenue.

with acq_table as (
    select distinct
        date,
        user_pseudo_id,
        session_id,
        FIRST_VALUE(session_traffic_source.source ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.source is null then 2 else 1 end, event_timestamp) AS source,
        FIRST_VALUE(session_traffic_source.medium ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.medium is null then 2 else 1 end, event_timestamp) AS medium,
        FIRST_VALUE(session_traffic_source.campaign ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.campaign is null then 2 else 1 end, event_timestamp)  AS campaign,
        FIRST_VALUE(session_traffic_source.gclid ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.gclid is null then 2 else 1 end, event_timestamp) AS gclid,
    
    from( 
        Select * except(traffic_source),
        array_agg( if(coalesce(traffic_source.source, traffic_source.medium, traffic_source.campaign, traffic_source.gclid)  is not null,traffic_source, null) ignore nulls order by event_timestamp asc limit 1) [safe_offset(0)] as session_traffic_source,
        from(
            SELECT
                PARSE_DATE("%Y%m%d",event_date) AS date,
                user_pseudo_id,
                event_timestamp,
                (SELECT value.int_value FROM UNNEST(event_params) WHERE  key = 'ga_session_id') AS session_id,
                (select
                    as struct 
                    (select value.string_value from unnest(event_params) where key = 'source') as source,
                    (select value.string_value from unnest(event_params) where key = 'medium') as medium,
                    (select value.string_value from unnest(event_params) where key = 'campaign') as campaign,
                    collected_traffic_source.gclid as gclid

                ) as traffic_source,
                
            FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
                where _table_suffix BETWEEN "20231001" and "20231031"
                and event_name NOT IN ('first_visit','session_start'))
        group by 1,2,3,4)
),
purchases_tbl as (
    Select
      user_pseudo_id,  
      (SELECT value.int_value FROM UNNEST(event_params) WHERE  key = 'ga_session_id') AS session_id,
      count(distinct ecommerce.transaction_id) as purchases,
      sum(ecommerce.purchase_revenue) as revenue,
    FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
    where _table_suffix BETWEEN "20231001" and "20231031"
    and event_name = "purchase"
),
Select
    case 
        when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct'
        when regexp_contains(campaign, 'cross-network') then 'Cross-network'
        when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
            or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$'))
            and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping'
        when (regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$')) or gclid is not null then 'Paid Search'
        when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social'
        when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video'
        when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display'
        when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
            or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping'
        when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
            or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social'
        when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
            or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video'
        when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
            or medium = 'organic' then 'Organic Search'
        when regexp_contains(source,'email|e-mail|e_mail|e mail')
            or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email'
        when medium = 'affiliate' then 'Affiliates'
        when medium = 'referral' then 'Referral'
        when medium = 'audio' then 'Audio'
        when medium = 'sms' then 'SMS'
        when medium like '%push'
            or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications'
        else 'Unassigned' end as channel_grouping_session,
        count(distinct concat(user_pseudo_id, session_id)) as sessions,
        sum(purchases) as purchases,
        sum(purchases)/count(distinct concat(user_pseudo_id, session_id))  as ecomm_conv_rate,
        sum(revenue) as revenue,
from acq_table
left join purchases_tbl using (user_pseudo_id, session_id)
group by 1

Goal Completion

You could use the parameter “is a conversion” to do this, but I find it easier just to list out the event you want to track as a conversion. This way, if it wasn’t set up as a conversion in the interface it won’t impact your reporting.

with acq_table as (
    select distinct
        date,
        user_pseudo_id,
        session_id,
        FIRST_VALUE(session_traffic_source.source ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.source is null then 2 else 1 end, event_timestamp) AS source,
        FIRST_VALUE(session_traffic_source.medium ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.medium is null then 2 else 1 end, event_timestamp) AS medium,
        FIRST_VALUE(session_traffic_source.campaign ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.campaign is null then 2 else 1 end, event_timestamp)  AS campaign,
        FIRST_VALUE(session_traffic_source.gclid ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.gclid is null then 2 else 1 end, event_timestamp) AS gclid,
    
    from( 
        Select * except(traffic_source),
        array_agg( if(coalesce(traffic_source.source, traffic_source.medium, traffic_source.campaign, traffic_source.gclid)  is not null,traffic_source, null) ignore nulls order by event_timestamp asc limit 1) [safe_offset(0)] as session_traffic_source,
        from(
            SELECT
                PARSE_DATE("%Y%m%d",event_date) AS date,
                user_pseudo_id,
                event_timestamp,
                (SELECT value.int_value FROM UNNEST(event_params) WHERE  key = 'ga_session_id') AS session_id,
                (select
                    as struct 
                    (select value.string_value from unnest(event_params) where key = 'source') as source,
                    (select value.string_value from unnest(event_params) where key = 'medium') as medium,
                    (select value.string_value from unnest(event_params) where key = 'campaign') as campaign,
                    collected_traffic_source.gclid as gclid

                ) as traffic_source,
                
            FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
                where _table_suffix BETWEEN "20231001" and "20231031"
                and event_name NOT IN ('first_visit','session_start'))
        group by 1,2,3,4)
),
goal_tbl as (
 Select
      user_pseudo_id,  
      (SELECT value.int_value FROM UNNEST(event_params) WHERE  key = 'ga_session_id') AS session_id,
      1 as goal_completion,
    FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
    where _table_suffix BETWEEN "20231001" and "20231031"
    and  event_name = "select_promotion" 
)
Select
    case 
        when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct'
        when regexp_contains(campaign, 'cross-network') then 'Cross-network'
        when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
            or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$'))
            and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping'
        when (regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$')) or gclid is not null then 'Paid Search'
        when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social'
        when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video'
        when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display'
        when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
            or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping'
        when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
            or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social'
        when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
            or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video'
        when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
            or medium = 'organic' then 'Organic Search'
        when regexp_contains(source,'email|e-mail|e_mail|e mail')
            or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email'
        when medium = 'affiliate' then 'Affiliates'
        when medium = 'referral' then 'Referral'
        when medium = 'audio' then 'Audio'
        when medium = 'sms' then 'SMS'
        when medium like '%push'
            or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications'
        else 'Unassigned' end as channel_grouping_session,
        count(distinct concat(user_pseudo_id, session_id)) as sessions,
        sum(goal_completion) as goal_completion,
        sum(goal_completion)/count(distinct concat(user_pseudo_id, session_id)) as goal_con_rate,
from acq_table
left join goal_tbl using (user_pseudo_id, session_id)
group by 1

Bringing it all together

Using these metrics and audience metrics we have from the previous post we have:

with acq_table as (
    select distinct
        date,
        user_pseudo_id,
        session_id,
        FIRST_VALUE(session_traffic_source.source ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.source is null then 2 else 1 end, event_timestamp) AS source,
        FIRST_VALUE(session_traffic_source.medium ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.medium is null then 2 else 1 end, event_timestamp) AS medium,
        FIRST_VALUE(session_traffic_source.campaign ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.campaign is null then 2 else 1 end, event_timestamp)  AS campaign,
        FIRST_VALUE(session_traffic_source.gclid ) OVER (PARTITION BY user_pseudo_id, session_id ORDER BY case when session_traffic_source.gclid is null then 2 else 1 end, event_timestamp) AS gclid,
    
    from( 
        Select * except(traffic_source),
        array_agg( if(coalesce(traffic_source.source, traffic_source.medium, traffic_source.campaign, traffic_source.gclid)  is not null,traffic_source, null) ignore nulls order by event_timestamp asc limit 1) [safe_offset(0)] as session_traffic_source,
        from(
            SELECT
                PARSE_DATE("%Y%m%d",event_date) AS date,
                user_pseudo_id,
                event_timestamp,
                (SELECT value.int_value FROM UNNEST(event_params) WHERE  key = 'ga_session_id') AS session_id,
                (select
                    as struct 
                    (select value.string_value from unnest(event_params) where key = 'source') as source,
                    (select value.string_value from unnest(event_params) where key = 'medium') as medium,
                    (select value.string_value from unnest(event_params) where key = 'campaign') as campaign,
                    collected_traffic_source.gclid as gclid

                ) as traffic_source,
                
            FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
                where _table_suffix BETWEEN "20231001" and "20231031"
                and event_name NOT IN ('first_visit','session_start'))
        group by 1,2,3,4)
),
purchases_tbl as (
    Select
      user_pseudo_id,  
      (SELECT value.int_value FROM UNNEST(event_params) WHERE  key = 'ga_session_id') AS session_id,
      count(distinct ecommerce.transaction_id) as purchases,
      sum(ecommerce.purchase_revenue) as revenue,
    FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
    where _table_suffix BETWEEN "20231001" and "20231031"
    and event_name = "purchase"
),
goal_tbl as (
 Select
      user_pseudo_id,  
      (SELECT value.int_value FROM UNNEST(event_params) WHERE  key = 'ga_session_id') AS session_id,
      1 as goal_completion,
    FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
    where _table_suffix BETWEEN "20231001" and "20231031"
    and  event_name = "select_promotion" 
)

Select
case 
        when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct'
        when regexp_contains(campaign, 'cross-network') then 'Cross-network'
        when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
            or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$'))
            and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping'
        when (regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$')) or gclid is not null then 'Paid Search'
        when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social'
        when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
            and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video'
        when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display'
        when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
            or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping'
        when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
            or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social'
        when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
            or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video'
        when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
            or medium = 'organic' then 'Organic Search'
        when regexp_contains(source,'email|e-mail|e_mail|e mail')
            or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email'
        when medium = 'affiliate' then 'Affiliates'
        when medium = 'referral' then 'Referral'
        when medium = 'audio' then 'Audio'
        when medium = 'sms' then 'SMS'
        when medium like '%push'
            or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications'
        else 'Unassigned' end as channel_grouping_session,
        count(distinct concat(user_pseudo_id, session_id)) as sessions,
        sum(purchases) as purchases,
        sum(purchases)/count(distinct concat(user_pseudo_id, session_id))  as ecomm_conv_rate,
        sum(revenue) as revenue,
        sum(goal_completion) as goal_completion,
        sum(goal_completion)/count(distinct concat(user_pseudo_id, session_id)) as goal_con_rate,

from acq_table
left join purchases_tbl using (user_pseudo_id, session_id)
left join goal_tbl using (user_pseudo_id, session_id)
group by 1
Previous
Previous

The GA4 update: matching big query data with Google analytics - Landing Page

Next
Next

Case Study: Time = Money for Biophilia Designs