A CLI that gates Agent Skill PRs by evaluating the prose and guidance of a skill bundle — not its code — for content that could steer an agent toward unsafe behavior. It runs a skill's markdown through a YAML rule pack and produces a single verdict: AUTO-PASS, WARN, or ESCALATE.
Status: both stages implemented. The CLI (
scan,version,rules list,rules lint), the static-pattern engine, the LLM-as-judge stage, thecore+mongodbrule packs, and the full output surface needed to gate PRs in CI (text,json,markdown, and GitHub Actions--emit-annotations) are working today.
Most skill scanners target untrusted-author maliciousness (prompt injection, code execution, exfiltration). skill-gate sits in a different lane: trusted-author content evaluation — skills written in good faith whose guidance could still direct an agent toward unsafe actions (logging credentials, over-broad permissions, unguarded destructive commands, ambiguous scope). It evaluates instructional content, not code logic.
A two-stage pipeline over every markdown file in a skill bundle:
- Static patterns (implemented) — regex rules for criteria with clear lexical signatures (credential references, hardcoded secrets in examples, destructive verbs, external URLs), with a heuristic that downgrades matches reading as cautionary documentation examples ("never log credentials") to an advisory WARN rather than silently dropping them — so a real instruction disguised with cautionary phrasing can never slip through to AUTO-PASS.
- LLM-as-judge (implemented) — rubric-based evaluation for criteria that need
semantic judgment (unsanitized user input, PII handling, over-broad permissions,
admin-command scope, ambiguous scope, missing guardrails). Each
llm_judgerule carries a rubric and a JSON-schema'd finding contract; the judge fails closed — a refusal, truncation, or invalid response is a tool error, never a silent pass — and caches results per(model, rule, file)so unchanged skills aren't re-evaluated.
Each rule declares a severity tier. The verdict is the highest tier any rule triggers:
| Verdict | Exit code | Meaning |
|---|---|---|
| AUTO-PASS | 0 |
No rules fired. |
| WARN | 1 |
Advisory findings; author resolves before opening a PR. |
| ESCALATE | 2 |
Blocking findings; requires security review. |
go install github.com/mongodb/skill-gate/cmd/skill-gate@latest# Scan a skill bundle (text output; exit code 0/1/2 = AUTO-PASS/WARN/ESCALATE)
# Stage 2 needs an LLM client (see below); add --static-only to run stage 1 alone.
skill-gate scan ./my-skill/
# Machine-readable output / PR comment
skill-gate scan ./my-skill/ -o json
skill-gate scan ./my-skill/ -o markdown
# CI: pin findings to lines in the PR diff via GitHub Actions annotations
skill-gate scan ./my-skill/ --emit-annotations
# Treat WARN as a blocking exit code (per-repo CI policy)
skill-gate scan ./my-skill/ --strict
# Filter low-confidence static findings: below the floor, ESCALATE→WARN or WARN drops
# (static stage only; judge confidence is uncalibrated, so the floor doesn't gate it)
skill-gate scan ./my-skill/ --min-confidence 0.6
# Choose which built-in packs run (default: all)
skill-gate scan ./my-skill/ --packs core # core only
skill-gate scan ./my-skill/ --packs none --rules-dir ./rules.d/ # only your packs
# Overlay your own rule packs (added on top of the selected built-ins)
skill-gate scan ./my-skill/ --rules-dir ./rules.d/
# Run only the static stage (no LLM client required)
skill-gate scan ./my-skill/ --static-only
# Inspect and validate rule packs
skill-gate rules list
skill-gate rules lint --rules-dir ./rules.d/--emit-annotations writes GitHub Actions workflow commands (::error / ::warning)
to stdout in addition to the chosen -o format, so a CI job can both produce a report
artifact and pin each finding to its line in the diff. The annotation paths are
resolved relative to the directory the scan is launched from.
The LLM-as-judge stage uses a client configured by environment —
ANTHROPIC_API_KEY (required), ANTHROPIC_BASE_URL (default
https://api.anthropic.com), and ANTHROPIC_AUTH_HEADER (default x-api-key; set
api-key for an Azure-fronted gateway) — with the model set by --llm-model
(default claude-sonnet-4-6). When the selected packs contain llm_judge rules but
no client is configured, the scan fails closed (a tool error) rather than skipping
them; pass --static-only to run stage 1 alone. The static stage needs no
configuration.
The judge result cache is opt-in and off by default (--cache-dir <path> enables
it): its validity is keyed only on public inputs and the author's own content, so a
committed cache could be forged to force a pass. In CI, point --cache-dir at an
ephemeral, never-committed location (or leave caching off); locally, treat it as
scratch. Skill content sent to the judge is fenced as untrusted input, so prose that
tries to talk the judge out of a finding does not change the verdict.
skill-gate is built to be reusable in other projects:
- Bring your own rules (available now) — rule packs are plain YAML loaded via
--rules-dir; no Go required. Validate them withskill-gate rules lint. - Choose your packs (available now) —
--packsselects which built-in packs run (e.g.--packs corefor a non-MongoDB repo, or--packs noneto run only your own overlays). The default runs every built-in pack. - Embed the scanner (available now) — call
scanner.Scanand read the returnedReport. Theverdictpackage is the stable vocabulary for tiers and exit codes. - Bring your own LLM client (available now) — implement the
llm.Clientinterface (oneJudgemethod) and pass it toscanner.ScanviaConfig.Client, e.g. for Bedrock or a custom gateway. Seeexamples/custom-client.
See docs/architecture.md for the package layout and the
public API surface.
make fmt # format (gofumpt via golangci-lint)
make lint # static analysis + license-header check
make test # race tests
make build # build the binary
make install-hooks # install pre-commit hooks (requires the pre-commit framework)golangci-lint is the single source of truth for lint and format, shared by the
pre-commit hook and CI.
Licensed under the Apache License 2.0 — see LICENSE.