Product List Performance - Matching BigQuery Data with Google Analytics

This will be the final in my series of matching Bigquery data with Google Analytics. This is actually where the idea all started. I had to try and recreate the product list report and couldn’t find anything to help me. So, if you can’t find it, do it yourself.

In the past, I have covered:

To get the product list performance you need 3 parts:

  • The views, click throughs and add to carts

  • The checkout and transactions

  • Link them together and account for products with no lists

The views, click throughs and add to carts

Depending on how you set up your site there is usually 3 times that you list the product list that is being used:

  • For product impressions when they are viewing a collection of products

  • For product clicks so you know which products are most popular products being clicked in a collection of products

  • On some add to carts, these days a lot of ecommerce sites allow you to add to cart from a collection, so you would add the product list there

So, let’s get all the details when we have a product list stored. We will need hit number for later use. The rest is just putting a 1 next to each action that happened to a product in a session for the three events that usually have product lists stored with them.

SELECT
    CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
    hits.hitNumber,
    product.productListName,
    product.productSKU,
    product.v2ProductName as name, 
  MAX(if(product.isImpression, 1,0)) AS product_impressions,
  MAX(if(product.isClick,1,0)) AS product_click,
  MAX(if(hits.eCommerceAction.action_type = "3", 1,0)) AS product_add_to_cart,
  FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
    UNNEST (hits) hits,
    UNNEST (hits.product) product
  WHERE
  _table_suffix BETWEEN "20170701"
  AND "20170731"
  AND (product.isImpression or product.isClick or hits.eCommerceAction.action_type = "3")
  GROUP BY
    sessionID,
    hitNumber,
    productListName,
    productSKU,
    name
    

But what happens if a user clicks on a product from a list, views a product detail page and adds to cart there. Most implementations of enhanced ecommerce don’t remember the list they came from but Google Analytics still stitches it together some how.

Though I couldn’t find anything offical what I figure is this, if there is list before they add to cart, attribute it to that list - think of it like it’s last non-direct attribution, it will go back through the events to see if there was a product click on any product list for that product and if there was that gets attribution. So to do that we need to look back in our table, using lead and lag. Lead looks forward 1 row (or you can tell it more) to see the values. So if the following row (when ordered by hit number) has an add to cart event with a not set product list name then you want to take that value and add it to your own. Like this:


SELECT
  * EXCEPT(product_add_to_cart),
  CASE
    WHEN LEAD(productListName) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber) = "(not set)" THEN LEAD(product_add_to_cart) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber)+product_add_to_cart
  ELSE
  product_add_to_cart
END
  AS product_add_to_cart_updated
FROM (
  SELECT
    CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
    hits.hitNumber,
    product.productListName,
    product.productSKU,
    product.v2ProductName as name, 
  MAX(if(product.isImpression, 1,0)) AS product_impressions,
  MAX(if(product.isClick,1,0)) AS product_click,
  MAX(if(hits.eCommerceAction.action_type = "3", 1,0)) AS product_add_to_cart,
  FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
    UNNEST (hits) hits,
    UNNEST (hits.product) product
  WHERE
  _table_suffix BETWEEN "20170701"
  AND "20170731"
  AND (product.isImpression or product.isClick or hits.eCommerceAction.action_type = "3")
  GROUP BY
    sessionID,
    hitNumber,
    productListName,
    productSKU,
    name)
GROUP BY
  sessionID,
  hitNumber,
  productListName,
  productSKU,
  name,
  product_impressions,
  product_click,
  product_add_to_cart
    

So now we just clean it up by removing all the not sets:


SELECT
  sessionID,
  productListName,
  productSKU,
  name,
  SUM(product_impressions) AS product_impressions,
  SUM(product_click) AS product_click,
  SUM(product_add_to_cart_updated) AS product_add_to_cart
FROM (
  SELECT
    CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
    hits.hitNumber,
    product.productListName,
    product.productSKU,
    product.v2ProductName as name, 
  MAX(if(product.isImpression, 1,0)) AS product_impressions,
  MAX(if(product.isClick,1,0)) AS product_click,
  MAX(if(hits.eCommerceAction.action_type = "3", 1,0)) AS product_add_to_cart,
  FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
    UNNEST (hits) hits,
    UNNEST (hits.product) product
  WHERE
  _table_suffix BETWEEN "20170701"
  AND "20170731"
  AND (product.isImpression or product.isClick or hits.eCommerceAction.action_type = "3")
  GROUP BY
    sessionID,
    hitNumber,
    productListName,
    productSKU,
    name)
  GROUP BY
    sessionID,
    hitNumber,
    productListName,
    productSKU,
    name,
    product_impressions,
    product_click,
    product_add_to_cart)
WHERE
  productListName != "(not set)"
GROUP BY
  sessionID,
  productListName,
  productSKU,
  name
    

The checkout and transactions

Now this is the easy part, similar to how we did it in product performance we just want the checkout, transactions and revenue for a product, we just need the session ID to match it up with. I’ve split you checkout and transactions as two seperate queries:


SELECT
  CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
  product.productSKU,
  1 AS product_add_to_cart,
  1 AS product_purchase,
  ROUND(SUM(product.productRevenue)/1000000,2) AS revenue
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
  UNNEST (hits) hits,
  UNNEST (hits.product) product
WHERE
  _table_suffix BETWEEN "20170701"
  AND "20170731"
    AND hits.eCommerceAction.action_type = "6"
GROUP BY
  sessionID,
  productSKU
    

and the checkout:


SELECT
  CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
  product.productSKU,
  1 AS product_add_to_cart,
  1 AS product_checkout
FROM
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
  UNNEST (hits) hits,
  UNNEST (hits.product) product
WHERE
  _table_suffix BETWEEN "20170701"
  AND "20170731"
    AND hits.eCommerceAction.action_type = "5"
GROUP BY
  sessionID,
  productSKU
    

Now you might be thinking, why is there a 1 for product_add_to_cart, we have already looked at add to cart, and we aren’t doing any test that they did add to cart. Well, you are right, smarty pants. But take this scenario, when I get to the home page there is a product impression for the dress I like but I am too much in mission mode, I go straight to the dresses collection, the dress pops up again but I know it’s a short sleeve dress so I go to that collection. So far I had 3 product impressions for that 1 dress, I think click on the dress in short sleeve dress collection and add to cart to buy. Now if we only join up our tables on session id and product sku, all 3 of these collections would get attributed for my purchase and would count the purchase 3 times. But if we only count the one time I added to cart, then it will match correctly.

Joining it all together

Like I said above, we then join them on not just session id and product sku but also the product list that had them add to cart. After that we can summarize it.


SELECT
  productListName,
  SUM(product_impressions) product_list_views,
  SUM(product_click) product_list_click,
  SUM(product_click)/ SUM(product_impressions) as product_list_ctr,
  SUM(product_add_to_cart) product_add_to_cart,
  SUM(product_checkout) product_checkout,
  SUM(product_purchase) product_purchase,
  SUM(revenue) revenue
FROM (
  SELECT
    sessionID,
    productListName,
    productSKU,
    name,
    SUM(product_impressions) AS product_impressions,
    SUM(product_click) AS product_click,
    SUM(product_add_to_cart_updated) AS product_add_to_cart
  FROM (
    SELECT
      * EXCEPT(product_add_to_cart),
      CASE
        WHEN LEAD(productListName) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber) = "(not set)" THEN LEAD(product_add_to_cart) OVER (PARTITION BY sessionID, productSKU ORDER BY hitNumber)+product_add_to_cart
      ELSE
      product_add_to_cart
    END
      AS product_add_to_cart_updated
    FROM (
      SELECT
        CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
        hits.hitNumber,
        product.productListName,
        product.productSKU,
        product.v2ProductName AS name,
        
  MAX(if(product.isImpression, 1,0)) AS product_impressions,
  MAX(if(product.isClick,1,0)) AS product_click,
  MAX(if(hits.eCommerceAction.action_type = "3", 1,0)) AS product_add_to_cart,
      FROM
        `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
        UNNEST (hits) hits,
        UNNEST (hits.product) product
      WHERE
        _table_suffix BETWEEN "20170701"
        AND "20170731"
  AND (product.isImpression or product.isClick or hits.eCommerceAction.action_type = "3")
      GROUP BY
        sessionID,
        hitNumber,
        productListName,
        productSKU,
        name)
    GROUP BY
      sessionID,
      hitNumber,
      productListName,
      productSKU,
      name,
      product_impressions,
      product_click,
      product_add_to_cart)
  WHERE
    productListName != "(not set)"
  GROUP BY
    sessionID,
    productListName,
    productSKU,
    name)
LEFT JOIN (
  SELECT
    CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
    product.productSKU,
    1 AS product_add_to_cart,
    1 AS product_purchase,
    ROUND(SUM(product.productRevenue)/1000000,2) AS revenue
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
    UNNEST (hits) hits,
    UNNEST (hits.product) product
  WHERE
    _table_suffix BETWEEN "20170701"
    AND "20170731"
    AND hits.eCommerceAction.action_type = "6"
  GROUP BY
    sessionID,
    productSKU)
USING
  (sessionID,
    productSKU,
    product_add_to_cart)
LEFT JOIN (
  SELECT
    CONCAT(fullVisitorID, CAST(visitID AS string), date) AS sessionID,
    product.productSKU,
    1 AS product_add_to_cart,
    1 AS product_checkout
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
    UNNEST (hits) hits,
    UNNEST (hits.product) product
  WHERE
    _table_suffix BETWEEN "20170701"
    AND "20170731"
    AND hits.eCommerceAction.action_type = "5"
  GROUP BY
    sessionID,
    productSKU)
USING
  (sessionID,
    productSKU,
    product_add_to_cart)
GROUP BY
  productListName
  

If you stuck with me through all that, congratulations and feel free to copy and paste what ever you need to get the job done. Sharing is caring.

Previous
Previous

Should I be A/B Testing?

Next
Next

Matching Big Query Data with Google Analytics - Product Performance