feat(scan): reactive status-aware fetch-failure handling (#58)#59
Conversation
Catch remote fetch failures per-target so a gated/private model no longer crashes with a raw traceback or silently reports "0 artifacts found". - scanner: wrap the repo resolve call and each per-target byte fetch; record a structured `fetch_failure` error into results['errors'] and continue, so one gated file in a multi-file repo doesn't abort the rest. Drives exit 1 via the existing errors path. - remote: stop swallowing resolve failures into []; a 401/403/404/network error now propagates so the scanner can surface a real, status-aware error (fixes the silent swallowed-[] case). - cli: _format_fetch_error maps the observed status (+ token presence + host) to a clean, traceback-free message on stderr — auth/private, auth-despite- token, network/firewall, 404, and a generic fallback. Emit one enriched cli_error per fetch failure (same #56 payload shape) alongside the normal cli_scan; _scan_error_payload is now the single source for that shape. Tests: status->message mapping per branch, per-target isolation, resolve propagates 401, exit 1 + no traceback, and both cli_error + cli_scan emit. 227 passed, coverage 88.72%. No version bump (release deferred to #59).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2708b09708
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| with RemoteStream(url) as stream: | ||
| self.artifacts.append(self._inspect_pytorch(stream, Path(url).name, is_remote=True)) | ||
| elif ext == SAFETENSORS_EXTENSION: | ||
| with RemoteStream(url) as stream: |
There was a problem hiding this comment.
Record read-time fetch failures from inspectors
When a remote file opens successfully but a later ranged read fails (for example the initial RemoteStream._fetch_size() request succeeds, then read() gets a CDN 403 or timeout), this call never reaches the new outer except because _inspect_safetensors catches all read exceptions internally and returns an artifact with risk_level: "LOW" plus an error field. That leaves results["errors"] empty, so the CLI exits 0 and can emit a clean SBOM for a file whose bytes were not actually fetched; the same pattern affects the other inspectors that swallow stream read errors.
Useful? React with 👍 / 👎.
Closes #58.
What & why
Today a fetch failure (gated/private HF model, no token) bubbles up as a raw Python traceback, and
resolve_huggingface_reposwallows a resolve-time 401 into[]→ silent "0 artifacts found" with exit 0. This makes failures reactive: make the request, then adapt the message to the observed status — never claiming a cause it can't see.Changes
fetch_failureerror intoresults['errors']and continue, so one gated file in a multi-file repo doesn't abort the rest. Drives exit 1 via the existing errors path.resolve_huggingface_repostops swallowing failures into[]; 401/403/404/network errors propagate so the scanner surfaces a real, status-aware error (the swallowed-[]fix)._format_fetch_errormaps status (+ token presence + host) to a clean, traceback-free stderr message: private/gated · auth-despite-token · network/firewall · 404 · generic fallback. Emits one enrichedcli_errorper fetch failure (same feat(scan): emit aisbom:risk / aisbom:legal component properties (#54) #56 payload keys) alongside the normalcli_scan._scan_error_payloadis now the single source for the telemetry shape.Verification
poetry run pytest --cov=aisbom --cov-fail-under=85→ 227 passed, coverage 88.72%hf://lab700xdev/aisbom-test):✖ …private or gated. Set HF_TOKEN· exit 1 (resolve-time, names repo id)LEGAL RISK (cc-by-nc-4.0)· exit 0✖ …despite a token being set…hf://google-bert/bert-base-uncased) still scans with no tokenNo version bump — release deferred to #59.
🤖 Generated with Claude Code