feat(scan): enrich cli_error telemetry with http_status + token_present (#56)#57
Merged
Merged
Conversation
…nt (#56) When a scan raises during fetch, the cli_error event now carries three diagnostic buckets so an HTTPError surge is explainable in GA4 without presuming a cause: - http_status: numeric status string for HTTPError ("401"/"404"...), "timeout"/"connection_error" for the network failures, "other" otherwise. Timeout is checked before ConnectionError so ConnectTimeout buckets as a timeout. - token_present: "true"/"false" for HF_TOKEN / HUGGING_FACE_HUB_TOKEN presence; the token value is never read into telemetry. - target_type: reuses _classify_target (hf/http/local). No URL, repo id, hostname, or response body is ever included. No version bump or release in this slice.
This was referenced Jun 5, 2026
lab700xdev
added a commit
that referenced
this pull request
Jun 5, 2026
Surface the #56/#57/#58 feature set to users and gate the release: - README "Authentication" subsection — env token (HF_TOKEN, then HUGGING_FACE_HUB_TOKEN), huggingface.co-only scope, never logged/telemetered (only the token_present boolean), CI snippet with secrets.HF_TOKEN, and an LFS-CDN egress note. - Telemetry & Privacy: disclose the new cli_error http_status bucket and token_present boolean; reaffirm the token value is never collected. - CHANGELOG: neutral 1.1.0 entry. - Bump version 1.0.7 -> 1.1.0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a
scanraises during fetch, thecli_errortelemetry event now carries three diagnostic buckets so the next HTTPError surge (parent #55) is explainable in GA4 — distinguishing auth (401/403), firewall (timeout/connection), and typos (404) — without presuming a cause and without leaking any identifying data.http_status— numeric status string forHTTPError("401"/"404"…),"timeout"/"connection_error"for the network failures,"other"otherwise.Timeoutis checked beforeConnectionErrorso aConnectTimeoutbuckets as a timeout.token_present—"true"/"false"forHF_TOKEN/HUGGING_FACE_HUB_TOKENpresence. The token value is never read into telemetry.target_type— reuses the existing_classify_targetbucket.Excluded from the payload: URL, repo id, hostname, response body. The
diffcli_errorpath is unchanged (not a fetch path).Tests
8 new tests in
tests/test_scanner_cli.py, including a leak test asserting no token value / repo id / URL / hostname / response body appears and that the payload keys are exactly the 5 agreed fields.poetry run pytest --cov=aisbom --cov-fail-under=85→ 209 passed, coverage 87.97%No version bump / release in this slice (deferred to #59).