Regular Updates for httparchive.almanac.cookies Table?
#4034
-
|
Hello HTTPArchive Team and Community, I'm writing to inquire about the possibility of more frequent updates to the Regular updates (e.g., monthly, quarterly) to this table would be incredibly valuable for tracking trends and changes in cookie usage over time. Thank you for your time and consideration. I appreciate all the work that goes into maintaining the HTTP Archive. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
The data is all available in So below is the SQL that populates it so you could just do similar into a temporary table. Alternatively, if that is definitely way too slow, then we could potentially schedule this to populate monthly. INSERT INTO `httparchive.almanac.cookies`
WITH intermediate_cookie AS (
SELECT
date,
client,
page,
root_page,
rank,
payload.startedDateTime AS startedDateTime,
cookie
FROM
`httparchive.crawl.pages`,
UNNEST(JSON_EXTRACT_ARRAY(custom_metrics.cookies)) AS cookie
WHERE
date = '2024-06-01'
)
SELECT
date,
client,
page,
root_page,
rank,
startedDateTime,
ENDS_WITH(NET.HOST(page), NET.REG_DOMAIN(JSON_VALUE(cookie, '$.domain'))) AS firstPartyCookie,
JSON_VALUE(cookie, '$.name') AS name,
JSON_VALUE(cookie, '$.domain') AS domain,
JSON_VALUE(cookie, '$.path') AS path,
JSON_VALUE(cookie, '$.expires') AS expires,
JSON_VALUE(cookie, '$.size') AS size,
JSON_VALUE(cookie, '$.httpOnly') AS httpOnly,
JSON_VALUE(cookie, '$.secure') AS secure,
JSON_VALUE(cookie, '$.session') AS session,
JSON_VALUE(cookie, '$.sameSite') AS sameSite,
JSON_VALUE(cookie, '$.sameParty') AS sameParty,
JSON_VALUE(cookie, '$.partitionKey') AS partitionKey,
JSON_VALUE(cookie, '$.partitionKeyOpaque') AS partitionKeyOpaque
FROM intermediate_cookie |
Beta Was this translation helpful? Give feedback.
The data is all available in
httparchive.crawl.pages. We just pulled it out into that table to make the queries run faster sicne the Cookie chapter of the Web Almanac was going to query it a LOT to figure out data with it and not everyone was lucky enough to work for Google where querying HTTP Archives is covered by our employer.So below is the SQL that populates it so you could just do similar into a temporary table.
Alternatively, if that is definitely way too slow, then we could potentially schedule this to populate monthly.