Skip to content

feat(scan): reactive status-aware fetch-failure handling (#58)#59

Merged
lab700xdev merged 1 commit into
mainfrom
slice-58-reactive-fetch
Jun 5, 2026
Merged

feat(scan): reactive status-aware fetch-failure handling (#58)#59
lab700xdev merged 1 commit into
mainfrom
slice-58-reactive-fetch

Conversation

@lab700xdev

Copy link
Copy Markdown
Contributor

Closes #58.

What & why

Today a fetch failure (gated/private HF model, no token) bubbles up as a raw Python traceback, and resolve_huggingface_repo swallows a resolve-time 401 into [] → silent "0 artifacts found" with exit 0. This makes failures reactive: make the request, then adapt the message to the observed status — never claiming a cause it can't see.

Changes

  • scanner.py — catch fetch failures per-target (resolve call + each byte fetch); record a structured fetch_failure error into results['errors'] and continue, so one gated file in a multi-file repo doesn't abort the rest. Drives exit 1 via the existing errors path.
  • remote.pyresolve_huggingface_repo stops swallowing failures into []; 401/403/404/network errors propagate so the scanner surfaces a real, status-aware error (the swallowed-[] fix).
  • cli.py_format_fetch_error maps status (+ token presence + host) to a clean, traceback-free stderr message: private/gated · auth-despite-token · network/firewall · 404 · generic fallback. Emits one enriched cli_error per fetch failure (same feat(scan): emit aisbom:risk / aisbom:legal component properties (#54) #56 payload keys) alongside the normal cli_scan. _scan_error_payload is now the single source for the telemetry shape.

Verification

  • poetry run pytest --cov=aisbom --cov-fail-under=85227 passed, coverage 88.72%
  • Manual, against a real private repo (hf://lab700xdev/aisbom-test):
    • no token → ✖ …private or gated. Set HF_TOKEN · exit 1 (resolve-time, names repo id)
    • valid token → artifact table + SBOM, correctly flags LEGAL RISK (cc-by-nc-4.0) · exit 0
    • bad token → ✖ …despite a token being set…
    • public model (hf://google-bert/bert-base-uncased) still scans with no token

No version bump — release deferred to #59.

🤖 Generated with Claude Code

Catch remote fetch failures per-target so a gated/private model no longer
crashes with a raw traceback or silently reports "0 artifacts found".

- scanner: wrap the repo resolve call and each per-target byte fetch; record
  a structured `fetch_failure` error into results['errors'] and continue, so
  one gated file in a multi-file repo doesn't abort the rest. Drives exit 1
  via the existing errors path.
- remote: stop swallowing resolve failures into []; a 401/403/404/network
  error now propagates so the scanner can surface a real, status-aware error
  (fixes the silent swallowed-[] case).
- cli: _format_fetch_error maps the observed status (+ token presence + host)
  to a clean, traceback-free message on stderr — auth/private, auth-despite-
  token, network/firewall, 404, and a generic fallback. Emit one enriched
  cli_error per fetch failure (same #56 payload shape) alongside the normal
  cli_scan; _scan_error_payload is now the single source for that shape.

Tests: status->message mapping per branch, per-target isolation, resolve
propagates 401, exit 1 + no traceback, and both cli_error + cli_scan emit.
227 passed, coverage 88.72%. No version bump (release deferred to #59).
@lab700xdev lab700xdev merged commit 05f2d44 into main Jun 5, 2026
2 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2708b09708

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread aisbom/scanner.py
with RemoteStream(url) as stream:
self.artifacts.append(self._inspect_pytorch(stream, Path(url).name, is_remote=True))
elif ext == SAFETENSORS_EXTENSION:
with RemoteStream(url) as stream:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Record read-time fetch failures from inspectors

When a remote file opens successfully but a later ranged read fails (for example the initial RemoteStream._fetch_size() request succeeds, then read() gets a CDN 403 or timeout), this call never reaches the new outer except because _inspect_safetensors catches all read exceptions internally and returns an artifact with risk_level: "LOW" plus an error field. That leaves results["errors"] empty, so the CLI exits 0 and can emit a clean SBOM for a file whose bytes were not actually fetched; the same pattern affects the other inspectors that swallow stream read errors.

Useful? React with 👍 / 👎.

@lab700xdev lab700xdev deleted the slice-58-reactive-fetch branch June 5, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant