Matching Big Query Data with Google Analytics - Landing pages

Following on of my series of recreating Google Analytics reports in Big Query, this time we are looking at the Landing page report.

Previously we have looked at:

Landing pages follows a lot of the same rules of channel acquisition but with one major difference. There is no landing page dimension in BigQuery. So helpful, right! To get the landing page you need to get the first page in the session, that would be the one with the smallest hit number. Now, you can’t assume that it will be the first hit, as sometimes there are non-interactive events firing before the page view. This is why you need to rank all the pages in a session and get the first one, as the query below shows.


SELECT
  PagePath
FROM (
  SELECT
    hits.page.PagePath,
    RANK() OVER (PARTITION BY fullvisitorid, visitid, date ORDER BY hits.hitNumber) AS ranking
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
    UNNEST (hits) hits
  WHERE
    _table_suffix BETWEEN "20170701"
    AND "20170731"
    AND totals.visits = 1
    AND hits.type = "PAGE"
  GROUP BY
    PagePath,
    hitNumber,
    fullvisitorid,
    visitid,
    date)
WHERE
  ranking = 1
GROUP BY
  pagePath
order by users desc

This line here:

RANK() OVER (PARTITION BY fullvisitorid, visitid, date ORDER BY hits.hitNumber) AS ranking`

will rank each hit from 1 to x based on the hit number, and by adding:

and hits.type = “PAGE”

we limit the ranking to pages only.

Now we have the first page in the session, we can get the rest of the metrics, just like channel groupings.

SELECT
  PagePath,
  sum(sessions) as sessions,
  sum(new_users) as new_users,
  round(sum(new_users) /sum(sessions)*100,2) as percentage_new_user,
  ROUND(sum(pageviews)/sum(sessions),2) AS pageviews_per_session,
  ROUND(sum(total_session_duration)/sum(sessions),2) AS avg_session_duration_secs,
  ROUND(sum(bounces)/sum(sessions)*100,2) AS bounceRate,
  sum(transactions) as transactions,
  ROUND(sum(transactions)/sum(sessions)*100,2) AS eCom_conversion_rate,
  ROUND(sum(total_revenue),2) AS total_revenue,
FROM (
  SELECT
    hits.page.PagePath,
    COUNT(DISTINCT fullvisitorId) as users,
    COUNT(DISTINCT
      CASE
        WHEN visitNumber = 1 THEN fullvisitorId
    END
      ) AS new_users,
    COUNT(DISTINCT CONCAT(fullvisitorId, CAST(visitId AS string), date)) AS sessions,
    COUNT(DISTINCT
      CASE
        WHEN hits.type = "PAGE" THEN CONCAT(fullvisitorID, CAST(visitId AS STRING), date, hits.hitNumber)
    END
      ) AS pageviews,
    SUM(totals.timeOnSite/ totals.hits ) AS total_session_duration,
    COUNT(DISTINCT
      CASE
        WHEN totals.bounces = 1 THEN CONCAT(fullvisitorId, CAST(visitId AS string), date)
    END
      ) AS bounces,
    sum(totals.transactions) AS transactions,
    SUM( totals.transactionRevenue/1000000) AS total_revenue,
    RANK() OVER (PARTITION BY fullvisitorid, visitid, date ORDER BY hits.hitNumber) AS ranking
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
    UNNEST (hits) hits
  WHERE
    _table_suffix BETWEEN "20170701"
    AND "20170731"
    AND totals.visits = 1
    AND hits.type = "PAGE"
  GROUP BY
    PagePath,
    hitNumber,
    fullvisitorid,
    visitid,
    date)
WHERE
  ranking = 1
GROUP BY
  pagePath
order by sessions desc

Any requests on what to do next or questions, feel free to comment!

Previous
Previous

Matching Big Query Data with Google Analytics - Product Performance

Next
Next

Handy Ecommerce Custom Dimensions in Google Analytics