Skip to content

feat(scan): enrich cli_error telemetry with http_status + token_present (#56)#57

Merged
lab700xdev merged 1 commit into
mainfrom
slice-56-cli-error-telemetry
Jun 5, 2026
Merged

feat(scan): enrich cli_error telemetry with http_status + token_present (#56)#57
lab700xdev merged 1 commit into
mainfrom
slice-56-cli-error-telemetry

Conversation

@lab700xdev
Copy link
Copy Markdown
Contributor

Summary

When a scan raises during fetch, the cli_error telemetry event now carries three diagnostic buckets so the next HTTPError surge (parent #55) is explainable in GA4 — distinguishing auth (401/403), firewall (timeout/connection), and typos (404) — without presuming a cause and without leaking any identifying data.

  • http_status — numeric status string for HTTPError ("401"/"404"…), "timeout" / "connection_error" for the network failures, "other" otherwise. Timeout is checked before ConnectionError so a ConnectTimeout buckets as a timeout.
  • token_present"true"/"false" for HF_TOKEN / HUGGING_FACE_HUB_TOKEN presence. The token value is never read into telemetry.
  • target_type — reuses the existing _classify_target bucket.

Excluded from the payload: URL, repo id, hostname, response body. The diff cli_error path is unchanged (not a fetch path).

Tests

8 new tests in tests/test_scanner_cli.py, including a leak test asserting no token value / repo id / URL / hostname / response body appears and that the payload keys are exactly the 5 agreed fields.

  • poetry run pytest --cov=aisbom --cov-fail-under=85209 passed, coverage 87.97%

No version bump / release in this slice (deferred to #59).

…nt (#56)

When a scan raises during fetch, the cli_error event now carries three
diagnostic buckets so an HTTPError surge is explainable in GA4 without
presuming a cause:

- http_status: numeric status string for HTTPError ("401"/"404"...),
  "timeout"/"connection_error" for the network failures, "other" otherwise.
  Timeout is checked before ConnectionError so ConnectTimeout buckets as
  a timeout.
- token_present: "true"/"false" for HF_TOKEN / HUGGING_FACE_HUB_TOKEN
  presence; the token value is never read into telemetry.
- target_type: reuses _classify_target (hf/http/local).

No URL, repo id, hostname, or response body is ever included. No version
bump or release in this slice.
@lab700xdev lab700xdev merged commit 880ba58 into main Jun 5, 2026
2 checks passed
@lab700xdev lab700xdev deleted the slice-56-cli-error-telemetry branch June 5, 2026 15:20
lab700xdev added a commit that referenced this pull request Jun 5, 2026
Surface the #56/#57/#58 feature set to users and gate the release:

- README "Authentication" subsection — env token (HF_TOKEN, then
  HUGGING_FACE_HUB_TOKEN), huggingface.co-only scope, never
  logged/telemetered (only the token_present boolean), CI snippet with
  secrets.HF_TOKEN, and an LFS-CDN egress note.
- Telemetry & Privacy: disclose the new cli_error http_status bucket and
  token_present boolean; reaffirm the token value is never collected.
- CHANGELOG: neutral 1.1.0 entry.
- Bump version 1.0.7 -> 1.1.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant