Assisted Conversions - Matching BigQuery Data with Google Analytics

I thought I had covered everything that needed to be covered in my matching BigQuery to Google analytics series until one day someone asked me to recreate the assisted conversions report, so here we are.

In the past, I have covered:

For this report we need:

  • The transactions

  • The past sessions

  • Tie the two together

The transactions

Recreating this report isn’t that hard. The first step is to get all the transactions from the period. Note this doesn’t work with the sample GA dataset because there is no client ID. In the query below, everything is pretty straightforward. The only bit that I haven’t covered before is the isTrueDirect. You may know that Google reports on the last non-direct attribution by default in other reports. In assisted conversions, you need the proper direct attribution.


SELECT
      date,
      clientId,
      visitNumber AS purchaseVisit,
      visitstarttime as purchaseVisitStartTime,
    IF
      (trafficSource.isTrueDirect,
        "Direct",
        channelGrouping) AS convertingChannel,
      totals.transactions AS Transactions,
      totals.totalTransactionRevenue AS Revenue,
      (
      SELECT
        MAX(TRANSACTION.transactionid)
      FROM
        UNNEST(hits) AS h
      WHERE
        TRANSACTION.transactionid IS NOT NULL) AS transactionid
    FROM
      `bigquery-public-data.google_analytics_sample.ga_sessions_*`
    WHERE
      _TABLE_SUFFIX BETWEEN "20170701"
  AND "20170731" 
      AND totals.transactions IS NOT NULL
    GROUP BY
      date,
      clientId,
      purchaseVisit,
  purchaseVisitStartTime,      
  convertingChannel,
      Transactions,
      Revenue,
      transactionid


    

The past sessions

Now you have the transactions. You need to see if they came to the site previously. Though, we only want the sessions since the last transaction, not all previous sessions. So this goes through all sessions, and if a user converts copy that transaction id to all the previous sessions going back, another transaction id is listed. Thanks to Corinne Brooker for that tip. You will also notice that my date range here goes one month more than the first query, so I have a 30-day lookback window. You should adjust this to what you want your assisted conversions to be within.


SELECT
  clientId,
  visitNumber,
  
      visitstarttime,
  LAST_VALUE((SELECT MAX(TRANSACTION.transactionid) FROM
          UNNEST(hits) AS h
    WHERE
      TRANSACTION.transactionid IS NOT NULL) IGNORE NULLS) 
      OVER (PARTITION BY fullvisitorid ORDER BY visitstarttime DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS upcomingtransactionid,
  CASE
    WHEN trafficSource.isTrueDirect IS TRUE THEN 'Direct'
  ELSE
  channelgrouping
END
  AS true_channelgrouping
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
  _table_suffix BETWEEN "20170601"
  AND "20170731"


Tie the two together

Now we have the two bits of data, we tie together the client id and generate the table with the data we need. You could join on client id and transaction id, but I filter out those that don’t match. That way, if they want to look further back, it is pretty straightforward.

In the end, I make sure the assisted conversion was before or the same visit as the transaction. I also check that it was within my 30-day lookback window. You can change the 30 there to whatever look back you want. Visitstarttime is in seconds, so the calculation converts days to seconds.


SELECT
    date,
    true_channelgrouping AS channel,
    CASE WHEN purchaseVisit = visitNumber THEN "last click" ELSE "assisted click" END AS conversionType,
    COUNT(DISTINCT clientId) AS users,
    SUM(Transactions) AS transactions,
    ROUND(SUM(revenue/1000000),2) AS revenue
  FROM (
    SELECT
      date,
      clientId,
      visitNumber AS purchaseVisit,
      visitstarttime as purchaseVisitStartTime,
      IF (trafficSource.isTrueDirect, "Direct", channelGrouping) AS convertingChannel,
      totals.transactions AS Transactions,
      totals.totalTransactionRevenue AS Revenue,
      (SELECT MAX(TRANSACTION.transactionid) FROM UNNEST(hits) AS h WHERE TRANSACTION.transactionid IS NOT NULL) AS transactionid
    FROM
      `bigquery-public-data.google_analytics_sample.ga_sessions_*`
    WHERE
      _TABLE_SUFFIX BETWEEN "20170701"AND "20170731" 
      AND totals.transactions IS NOT NULL
    GROUP BY
      date,
      clientId,
      visitNumber,
      visitstarttime,
      convertingChannel,
      Transactions,
      Revenue,
      transactionid)
  LEFT JOIN (
    SELECT
        clientId,
        visitNumber,
        visitstarttime,
        LAST_VALUE((SELECT MAX(TRANSACTION.transactionid) FROM UNNEST(hits) AS h WHERE TRANSACTION.transactionid IS NOT NULL) IGNORE NULLS) OVER (PARTITION BY fullvisitorid ORDER BY visitstarttime DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS upcomingtransactionid,
        CASE WHEN trafficSource.isTrueDirect IS TRUE THEN 'Direct' ELSE channelgrouping END AS true_channelgrouping
    FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
    WHERE _table_suffix BETWEEN "20170601" AND "20170731") USING (clientId)
  WHERE
    upcomingtransactionid = transactionid
    AND purchaseVisit >=visitNumber
    AND purchaseVisitStartTime - visitstarttime <= (30*24*60*60)
  GROUP BY
    date,
    channel,
    conversionType

That is how it is done. Any questions or comments, please leave them below and call out if there is anything else you might want to see in BQ.

Previous
Previous

Data Visualisation for Social Good

Next
Next

How iOS 14 and cookie policy changes affect your conversion tags