feat(ingest): capture request headers into event context#1343
feat(ingest): capture request headers into event context#1343vklimontovich wants to merge 1 commit into
Conversation
Add context.headers to the event so destinations can see the raw HTTP headers (accept, content-type, sec-fetch-*, sec-ch-ua*, ...) and tell real browser traffic from bots/agents. - Browser endpoint derives context.headers from the request only; the body can't redefine them (a browser can't read its own headers anyway). - S2S endpoint captures the forwarding request's headers but lets the caller override allow-listed headers via the body to forward the original device's headers. - cookie/authorization are stripped and the write key is masked, so secrets don't leak to destinations. Types: add AnalyticsContext.headers and an optional RuntimeFacade.headers() so a Node integration can supply the original headers; jitsu-js wires it into the built context (no-op in the browser).
There was a problem hiding this comment.
Reviewed the changes in bulker/ingest/router.go, libs/jitsu-js/src/analytics-plugin.ts, and types/protocols/analytics.d.ts.
The overall direction makes sense (capturing request headers into context.headers and masking sensitive values), but I found one correctness/security edge case in the Go implementation and left an inline comment with details.
| // event.context.headers. Internal (x-jitsu-*, x-vercel*) and sensitive (cookie, | ||
| // authorization) headers are dropped, and the write key is masked. Allow-listed headers | ||
| // already present in the event body (bodyHeaders) win over the request headers. | ||
| func buildContextHeaders(c *gin.Context, bodyHeaders any) map[string]string { |
There was a problem hiding this comment.
buildContextHeaders returns a plain map[string]string, and that gets written into event.context.headers. In the browser path we call types.FilterEvent(ev) after this, but FilterEvent only recurses through types.Json/[]any, so it won’t sanitize keys inside this map. That means a crafted header like __sql_type_* can survive and become a SQL type hint downstream after JSON reparse. Can we return types.Json here (or otherwise run equivalent filtering) so header keys go through the same sanitization path?
What
Adds
context.headersto ingested events so destinations can see the raw HTTP request headers (accept,content-type,sec-fetch-*,sec-ch-ua*, …) and distinguish real browser traffic from bots/agents. Today onlycontext.userAgentis available.Behavior
context.headersis derived only from the actual request; the body can't redefine them (a browser can't read its own request headers anyway, and shouldn't be able to spoof them).accept,accept-language,accept-encoding,content-type,user-agent,referer,dnt,sec-fetch-*,sec-ch-ua*.cookie/authorizationare stripped and the write key is masked before headers reachcontext(which is forwarded to destinations). Keys are lower-cased. The internalIngestMessage.HttpHeaders(full set) is unchanged.Types
AnalyticsContext.headers?: Record<string, string>(Jitsu extension — Segment's spec has no raw-headers field).RuntimeFacade.headers()so a Node integration can supply the original device's headers;@jitsu/jswires it into the built context (no-op in the browser).Notes for bot detection
The
sec-fetch-*/sec-ch-ua*set is the strongest tell — raw HTTP clients (curl, python-requests, most non-browser agents) don't send them; only real/headless browsers do.🤖 Generated with Claude Code