Skip to content
Discussion options

You must be logged in to vote

The data is all available in httparchive.crawl.pages. We just pulled it out into that table to make the queries run faster sicne the Cookie chapter of the Web Almanac was going to query it a LOT to figure out data with it and not everyone was lucky enough to work for Google where querying HTTP Archives is covered by our employer.

So below is the SQL that populates it so you could just do similar into a temporary table.

Alternatively, if that is definitely way too slow, then we could potentially schedule this to populate monthly.

INSERT INTO `httparchive.almanac.cookies`
WITH intermediate_cookie AS (
  SELECT
    date,
    client,
    page,
    root_page,
    rank,
    payload.startedDateTime

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@tsunoyu
Comment options

Answer selected by tsunoyu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants