fix(security): add CodeQL sanitizer model for redact_secrets to clear false-positive clear-text-logging alerts (e.g. #29)

## Summary

CodeQL periodically flags **clear-text logging of sensitive information** (and, in principle, other taint-based rules) on call sites that pass their argument through one of our own redaction/sanitization helpers. CodeQL's taint analysis does not recognize these helpers as sanitizers, so genuinely-redacted flows are reported as `error`-severity alerts.

This issue tracks building a **general, reusable mechanism for teaching CodeQL about our in-repo sanitizers** — not a one-off fix for a single alert — so that current and future false positives of this shape can be suppressed *at the analysis level* (with detection of truly-unsanitized flows preserved). Alert #29 is used only as the first worked example / validation target.

## Worked example — alert #29 (the canonical case to validate against)

- **Alert:** [code-scanning/29](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/security/code-scanning/29)
- **Rule:** `py/clear-text-logging-sensitive-data` (severity: error)
- **Location:** `agent/src/shell.py:14`

```python
def log(prefix: str, text: str):
    """Print a timestamped, redacted log line."""
    ts = time.strftime("%H:%M:%S")
    print(f"[{ts}] {prefix} {redact_secrets(text)}", flush=True)  # <-- flagged
```

The argument is passed through `redact_secrets()` (`agent/src/shell.py:108`), which strips GitHub tokens (`ghp_`, `github_pat_`, `gho_`, `ghs_`, `ghr_`), `x-access-token:` values, `authorization:` headers, and `?token=`/`password=` query params. CodeQL does not model this helper as a sanitizer/barrier, so the redacted flow is still reported.

> This same shape will recur for any future helper we use to redact/escape/sanitize before a sink. The mechanism below should be reusable for those without per-alert bespoke work.

## Current CodeQL configuration in this repo (important context)

Verified via the code-scanning API (2026-06-05):

- This repo runs CodeQL via **default setup** — there is **no** `.github/workflows/*codeql*` or `codeql-config.yml` in the tree.
  `code-scanning/default-setup` → `{"state":"configured","languages":["actions","javascript","javascript-typescript","python","typescript"],"query_suite":"default","threat_model":"remote","schedule":"weekly"}`
- Analyses run as `dynamic/github-code-scanning/codeql:analyze` per language.

This constrains the implementation options below (default vs. advanced setup).

## Proposed approach — generalized sanitizer modeling

Add a CodeQL **model pack / data extension** that lives in-repo and is reusable across helpers and (eventually) languages.

**Where the files live** (auto-detected by default setup — no workflow edits required):

```
.github/codeql/extensions/
└── abca-python-models/            # one pack per language
    ├── codeql-pack.yml
    └── models/
        └── redaction.model.yml    # one or more data-extension files; add rows over time
```

`codeql-pack.yml`:
```yaml
name: aws-samples/abca-python-models
version: 0.0.1
library: true
extensionTargets:
  codeql/python-all: "*"
dataExtensions:
  - models/**/*.yml
```

**How it loads:** Per GitHub docs, model packs in `.github/codeql/extensions/` are *"automatically detected and used"* by **default setup**, and remain recognized if the repo later switches to advanced setup — so this is forward-compatible. (CodeQL model packs are currently **public preview**; see open questions.)

**Generalization principle:** one pack per language (`abca-python-models`, later `abca-js-models` targeting `codeql/javascript-all`), each with a `models/` dir where new sanitizer rows are appended as new helpers appear. Document the "add a row" recipe so future contributors don't need to re-derive this.

## Options under consideration (decide before/at approval)

The implementer/reviewer should pick among these — they are **not** all mutually exclusive (1 can be a stepping stone, 4 is always available):

1. **Model pack via data extension on default setup** *(proposed primary)* — least invasive, no workflow, auto-loaded, reusable. **Risk:** may not be expressible (see open questions) — data extensions are designed for modeling library source/sink/summary, and expressing an internal-function **sanitizer/barrier** for `py/clear-text-logging-sensitive-data` specifically may not be supported as data.
2. **Advanced setup + custom query / library override** — convert default setup to an advanced-setup workflow, then ship a customized query pack / sanitizer barrier (e.g. extend the query's `SanitizerGuard`/`Barrier` to include our helpers). More powerful and almost certainly *can* express the sanitizer, but more invasive (owns the whole CodeQL workflow) and more to maintain.
3. **Advanced setup, even if option 1 works** — *independent operational reason:* default setup occasionally hits **rate limits**, and re-running a default-setup scan requires a **commit/push** to re-trigger, whereas advanced setup can be **re-run on demand via the GitHub Actions UI / `workflow_dispatch`**. This alone may justify converting to advanced setup regardless of the sanitizer outcome. Worth weighing as a standalone improvement.
4. **Dismiss the alert(s) with documented reasoning** *(always-available fallback)* — mark #29 (and similar) as "false positive" / "won't fix" in the code-scanning UI with a written justification referencing `redact_secrets`. **Pros:** zero code, immediate. **Cons:** per-alert manual toil that recurs on every new occurrence; the reasoning lives in the alert UI rather than in-repo; doesn't improve analysis fidelity. Acceptable if the modeling effort proves not worth it.

## Open questions / unknowns to resolve BEFORE approval & work

These must be answered (ideally via the spike below) before committing to implementation:

- [ ] **Is a sanitizer/barrier for an *internal* function even expressible via a data extension** for `py/clear-text-logging-sensitive-data`? Data-extension modeling is oriented toward library APIs (source/sink/summary). If a "sanitizer" kind isn't available for this query as data, option 1 is dead and we fall to option 2 or 4.
- [ ] **Does default setup actually pick up the pack** in `.github/codeql/extensions/` for our `python` analysis, and does alert #29 stop reproducing on a fresh scan? (Public-preview behavior — verify empirically, don't assume.)
- [ ] **Public-preview risk acceptance:** model packs in default setup are explicitly *"in public preview and subject to change."* Is that acceptable for a non-blocking, defensive false-positive suppression? (Likely yes, but call it out.)
- [ ] **Re-run / rate-limit pain:** confirm whether the default-setup re-run-requires-a-push limitation is significant enough on its own to justify converting to advanced setup (option 3) as part of this work or as a separate issue.
- [ ] **Negative-detection preserved:** confirm whichever approach is chosen does **not** blanket-suppress the rule — an unredacted `print(secret)` must still alert.
- [ ] **Maintenance owner / location of the "how to add a new sanitizer row" doc** (security guide vs. AGENTS.md routing table).

## Proof-of-concept spike (do this first, time-boxed)

Before full implementation, run a throwaway spike on a branch to de-risk option 1:

1. Add a minimal `.github/codeql/extensions/abca-python-models/` pack modeling `agent/src/shell.py::redact_secrets` as a sanitizer/summary for the clear-text-logging config.
2. Push to a branch; let the default-setup CodeQL scan run on the PR.
3. **Check:** does alert #29 disappear? Does a deliberately-added `print(<unredacted secret>)` still alert?
4. Record the outcome in this issue. If the spike fails (option 1 not expressible), pivot to option 2 (advanced setup + custom query) or option 4 (documented dismissal) — both pre-discussed above.

## Acceptance criteria (assuming a modeling approach is chosen)

- [ ] A reusable, in-repo CodeQL model pack (or advanced-setup custom query) declares our redaction helper(s) — starting with `redact_secrets` (Python) — as a sanitizer for `py/clear-text-logging-sensitive-data`.
- [ ] The mechanism is **generalized**: documented "add-a-row" / "add-a-pack" recipe so future helpers/languages are covered without bespoke work.
- [ ] CodeQL (default or advanced) loads the model so it applies to scans.
- [ ] Alert #29 no longer reproduces on a fresh CodeQL scan (verified on the PR).
- [ ] A negative case (unredacted `print(secret)`) still triggers the rule.
- [ ] Doc note explaining the model and how to extend it for new redaction helpers.
- [ ] If option 3/advanced setup is adopted, the workflow is committed and the default→advanced migration is documented.

## Out of scope

- Alert #30 (`cli/src/commands/admin.ts` — intentional admin temp-password display). Triaged separately; this issue is about reusable sanitizer modeling for genuinely-redacted flows.

## Notes

`redact_secrets` is pattern-based and best-effort. Modeling it as a sanitizer is acceptable for these log paths, but the implementer should sanity-check the patterns cover the secret shapes actually logged before trusting it engine-wide.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): add CodeQL sanitizer model for redact_secrets to clear false-positive clear-text-logging alerts (e.g. #29) #276

Summary

Worked example — alert #29 (the canonical case to validate against)

Current CodeQL configuration in this repo (important context)

Proposed approach — generalized sanitizer modeling

Options under consideration (decide before/at approval)

Open questions / unknowns to resolve BEFORE approval & work

Proof-of-concept spike (do this first, time-boxed)

Acceptance criteria (assuming a modeling approach is chosen)

Out of scope

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fix(security): add CodeQL sanitizer model for redact_secrets to clear false-positive clear-text-logging alerts (e.g. #29) #276

Description

Summary

Worked example — alert #29 (the canonical case to validate against)

Current CodeQL configuration in this repo (important context)

Proposed approach — generalized sanitizer modeling

Options under consideration (decide before/at approval)

Open questions / unknowns to resolve BEFORE approval & work

Proof-of-concept spike (do this first, time-boxed)

Acceptance criteria (assuming a modeling approach is chosen)

Out of scope

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions