Official mascot design by Matias Mesa.
The coding agent that runs 24/7, learns from its mistakes, and costs $0 when you want it to.
23 built-in tools • 8 LLM providers • 5-tier memory • 24/7 autonomous • $0 local mode
ForgeGod orchestrates multiple LLMs (OpenAI, Anthropic, Google Gemini, Ollama, OpenRouter, DeepSeek, Kimi via Moonshot, and Z.AI GLM) into a single autonomous coding engine. It routes tasks to the right model, runs 24/7 from a PRD, learns from every outcome, and self-improves its own strategy. Run it locally for $0 with Ollama, use cloud API keys when you need them, or connect native OpenAI Codex subscription auth and Z.AI Coding Plan inside the ForgeGod CLI.
pip install forgegodAudit note (re-verified 2026-04-11): the verified baseline now includes
23registered tools,8provider families,9route surfaces,567collected tests,483non-stress tests passing by default,84/84stress tests passing, green lint, and a green build. The strict Docker integration path remains opt-in and only runs when the local daemon is actually ready. The primary human entrypoint is now conversationalforgegod; it auto-bootstraps repo-local config on first use, and it now honors the same runtime overrides as scripted surfaces, including--terse, model overrides, permission/approval flags, provider preference, and explicit OpenAI surface selection.forgegod runremains the explicit scripted surface,forgegod evalsnow covers deterministic chat, run, loop, worktree, and strict-interface regressions, splits scores by harness dimension, ships an OpenAI surfaces matrix, emits local trace-grader summaries, and now also offers an opt-in live OpenAI probe matrix plus a ranking matrix that recommends the best runnable OpenAI harness row when one exists. Native Windows Codex support is now a production-ready ForgeGod path when the official Codex CLI is installed and logged in.forgegod loopno longer auto-commits or auto-pushes by default. Read docs/AUDIT_2026-04-07.md, docs/OPERATIONS.md, docs/WEB_RESEARCH_2026-04-07.md, and docs/OPENAI_SURFACES_2026-04-10.md before making runtime changes.
Every other coding CLI uses one model at a time and resets to zero each session. ForgeGod doesn't.
| Capability | Claude Code | Codex CLI | Aider | Cursor | ForgeGod |
|---|---|---|---|---|---|
| Multi-model auto-routing | - | - | manual | - | yes |
| Local + cloud hybrid | - | basic | basic | - | native |
| 24/7 autonomous loops | - | - | - | - | yes |
| Cross-session memory | basic | - | - | removed | 5-tier |
| Self-improving strategy | - | - | - | - | yes (SICA) |
| Cost-aware budget modes | - | - | - | - | yes |
| Reflexion code generation | - | - | - | - | 3-attempt |
| Parallel git worktrees | subagents | - | - | - | experimental |
| Stress tested + benchmarked | - | - | - | - | audited baseline |
Scaffolding adds ~11 points on SWE-bench — harness engineering matters as much as the model. ForgeGod is the harness:
- Ralph Loop — 24/7 coding from a PRD. Progress lives in git, not LLM context. Fresh agent per story. No context rot.
- 5-Tier Memory — Episodic (what happened) + Semantic (what I know) + Procedural (how I do things) + Graph (how things connect) + Error-Solutions (what fixes what). Memories decay, consolidate, and reinforce automatically.
- Reflexion Coder — 3-attempt code gen with escalating models: local (free) → cloud (cheap) → frontier (when it matters). The repo now wires workspace scoping, command auditing, blocked paths, and generated-code warnings into runtime, while the audit tracks the remaining hardening gaps.
- DESIGN.md Native — Import a design preset, drop
DESIGN.mdin repo root, and frontend tasks inherit that design language automatically. - Natural-Language CLI — ForgeGod now explains what it is doing in plain language while it works, and the CLI surfaces share the same branded cyan/white/yellow UX instead of raw transport noise.
- Contribution Mode — Read
CONTRIBUTING.md, inspect the repo, surface approachable issues, and plan or execute contribution-sized changes with repo-specific guardrails. - SICA — Self-Improving Coding Agent. Modifies its own prompts, model routing, and strategy based on outcomes. Safety guardrails and audit policy keep that loop honest.
- Budget Modes —
normal→throttle→local-only→halt. Auto-triggered by spend. Run forever on Ollama for $0.
You don't need to be a developer to use ForgeGod. If you can describe what you want in plain English, ForgeGod writes the code.
- Install Ollama: https://ollama.com/download
- Pull a model:
ollama pull qwen3.5:9b - Install ForgeGod:
pip install forgegod - Start the session:
forgegod - Say what you want naturally, for example:
Create a simple website with a contact form - If you want the guided wizard instead, run:
forgegod init
- Install ForgeGod:
pip install forgegod - Run:
forgegod auth login openai-codex - Inspect the OpenAI split you want:
forgegod auth explain --profile adversarial --prefer-provider openai --openai-surface api+codex - Run:
forgegod auth sync --profile adversarial --prefer-provider openai --openai-surface api+codex - Start the session:
forgegod - Say what you want naturally, for example:
Build a REST API with user authentication
ForgeGod stays the entrypoint. It delegates the one-time login to the official Codex auth flow, then keeps day-to-day usage inside ForgeGod CLI.
- Export
ZAI_CODING_API_KEY=... - Install ForgeGod:
pip install forgegod - Run:
forgegod auth sync --profile adversarial - Start the session:
forgegod - Say what you want naturally, for example:
Build a REST API with user authentication
For the strongest current subscription-backed setup inside ForgeGod, use:
planner = zai:glm-5.1researcher = zai:glm-5.1coder = zai:glm-5.1reviewer = openai-codex:gpt-5.4sentinel = openai-codex:gpt-5.4escalation = openai-codex:gpt-5.4
See docs/GLM_CODEX_HARNESS_2026-04-08.md,
docs/examples/glm_codex_coding_plan.toml,
and run python scripts/smoke_glm_codex_harness.py before high-stakes use.
This harness is research-backed and works in ForgeGod today. The ZAI_CODING_API_KEY
path should still be treated as experimental and at-your-own-risk until Z.AI
explicitly recognizes ForgeGod as a supported coding tool.
If you want ForgeGod to stay inside OpenAI surfaces, apply an explicit surface mode:
planner = openai:gpt-5.4coder = openai:gpt-5.4-minireviewer = openai-codex:gpt-5.4sentinel = openai:gpt-5.4escalation = openai:gpt-5.4researcher = openai:gpt-5.4-mini
forgegod auth explain --profile adversarial --prefer-provider openai --openai-surface api+codex
forgegod auth sync --profile adversarial --prefer-provider openai --openai-surface api+codexForgeGod now supports four explicit OpenAI surface modes:
autoapi-onlycodex-onlyapi+codex
api+codex keeps the adversarial split but makes the contract explicit:
OpenAI API handles builder/research roles while Codex subscription handles the
reviewer path when both are connected. ChatGPT/Codex subscription access and
OpenAI API billing remain separate surfaces.
If you want a simpler setup, ForgeGod also supports single-model mode during
forgegod init and forgegod auth sync --profile single-model. That pins all
roles to one detected model instead of using the recommended adversarial split.
forgegod is now the primary conversational entrypoint for humans, and it
auto-creates local project config on first use. Use forgegod init when you
want the guided wizard, and use forgegod run "..." when you need a
deterministic, non-interactive command for scripts, CI, or reproducible
automation. The same root entrypoint also accepts session overrides such as
--terse, --model, --review/--no-review, --permission-mode,
--approval-mode, and repeated --allow-tool values.
Run forgegod doctor — it checks your setup and tells you exactly what to fix.
If you want the real strict sandbox, read
docs/STRICT_SANDBOX_SETUP.md.
It explains Docker Desktop, the required sandbox image, and the safe fix path
in non-technical terms.
# Install
pip install forgegod
# Fastest path: talk to ForgeGod directly
forgegod
# Optional guided project setup
forgegod init
# Or force one harness style explicitly
forgegod init --profile adversarial
forgegod init --profile single-model
# Check native auth surfaces
forgegod auth status
# Link ChatGPT-backed OpenAI Codex subscription, then sync config defaults
forgegod auth login openai-codex
forgegod auth explain --profile adversarial --prefer-provider openai --openai-surface api+codex
forgegod auth sync --profile adversarial --prefer-provider openai --openai-surface api+codex
# Talk to ForgeGod in natural language
forgegod
# Explicit scripted task surface
forgegod run "Add a /health endpoint to server.py with uptime and version info"
# Deterministic harness evals
forgegod evals
forgegod evals --case chat_natural_language_roundtrip
forgegod evals --matrix openai-surfaces
forgegod evals --matrix openai-live
forgegod evals --matrix openai-live-compare
# Plan a project → generates PRD
forgegod plan "Build a REST API for a todo app with auth, CRUD, and tests"
# 24/7 autonomous loop from PRD
# Loop defaults: no auto-commit or auto-push unless you explicitly enable those flags
# Parallel workers require a git repo with at least one commit because ForgeGod uses isolated worktrees
forgegod loop --prd .forgegod/prd.json
# Caveman mode — 50-75% token savings with ultra-terse prompts
forgegod --terse
# Check what it learned
forgegod memory
# View cost breakdown
forgegod cost
# Benchmark your models
forgegod benchmark
# Evaluate ForgeGod itself
forgegod evals
# Install a DESIGN.md preset for frontend work
forgegod design pull claude
# Plan a contribution against another repo
forgegod contribute https://github.com/owner/repo --goal "Improve tests"
# Health check
forgegod doctorForgeGod auto-detects your environment on first run:
- Finds API keys in env vars (
OPENAI_API_KEY,ANTHROPIC_API_KEY,OPENROUTER_API_KEY,GOOGLE_API_KEY/GEMINI_API_KEY,DEEPSEEK_API_KEY,MOONSHOT_API_KEY,ZAI_CODING_API_KEY,ZAI_API_KEY) and detects native OpenAI Codex login state - Checks if Ollama is running locally
- Detects your project language, test framework, and linter
- Picks auth-aware model defaults for each role based on what's available
- Lets you choose
adversarial(recommended) orsingle-model, plusautooropenaiprovider preference - Lets you choose OpenAI
auto,api-only,codex-only, orapi+codexsurfaces when you want explicit OpenAI behavior - Creates
.forgegod/config.tomlwith sensible defaults
No manual setup required. Just run forgegod and go.
On first conversational use, ForgeGod auto-creates .forgegod/config.toml
with auth-aware defaults when it can detect usable model surfaces. If you want
the guided wizard or explicit profile selection up front, run forgegod init.
If you add a new provider later, run forgegod auth sync --profile adversarial
or forgegod auth sync --profile single-model to rewrite model defaults from
detected auth surfaces.
┌─────────────────────────────────────────────────┐
│ RALPH LOOP │
│ │
│ ┌──────┐ ┌───────┐ ┌─────────┐ ┌─────┐ │
│ │ READ │──▶│ SPAWN │──▶│ EXECUTE │──▶│ VAL │ │
│ │ PRD │ │ AGENT │ │ STORY │ │IDATE│ │
│ └──────┘ └───────┘ └─────────┘ └──┬──┘ │
│ ▲ │ │
│ │ ┌────────┐ ┌────────┐ │ │
│ └─────────│ROTATE │◀───│COMMIT │◀──┘ │
│ │CONTEXT │ │OR RETRY│ pass │
│ └────────┘ └────────┘ │
│ │
│ Progress is in GIT, not LLM context. │
│ Fresh agent per story. No context rot. │
│ Create .forgegod/KILLSWITCH to stop. │
└─────────────────────────────────────────────────┘
- Read PRD — Pick highest-priority TODO story
- Spawn agent — Fresh context (progress is in git, not memory)
- Execute — Agent uses 23 tools to implement the story
- Validate — Tests, lint, syntax, frontier review
- Finalize or retry — Pass: review diff + mark done. Fail: retry up to 3x with model escalation
- Rotate — Next story. Context is always fresh.
ForgeGod has the most advanced memory system of any open-source coding agent:
| Tier | What | How | Retention |
|---|---|---|---|
| Episodic | What happened per task | Full outcome records | 90 days |
| Semantic | Extracted principles | Confidence + decay + reinforcement | Indefinite |
| Procedural | Code patterns & fix recipes | Success rate tracking | Indefinite |
| Graph | Entity relationships + causal edges | Auto-extracted from outcomes | Indefinite |
| Error-Solution | Error pattern → fix mapping | Fuzzy match lookup | Indefinite |
Memories decay with category-specific half-life (14d debugging → 90d architecture), consolidate via O(n*k) category-bucketed comparison, and are recalled via FTS5 + Jaccard hybrid retrieval (Reciprocal Rank Fusion). SQLite WAL mode for concurrent access.
# Check memory health
forgegod memory
# Memory is stored in .forgegod/memory.db (SQLite)
# Global learnings in ~/.forgegod/memory.db (cross-project)| Mode | Behavior | Trigger |
|---|---|---|
normal |
Use all configured models | Default |
throttle |
Prefer local, cloud for review only | 80% of daily limit |
local-only |
Ollama only, $0 operation | Manual or 95% limit |
halt |
Stop all LLM calls | 100% of daily limit |
# Check spend
forgegod cost
# Override mode
export FORGEGOD_BUDGET_MODE=local-onlyUltra-terse prompts that reduce token usage 50-75% with no accuracy loss for coding tasks. Backed by 2026 research:
- Mini-SWE-Agent — 100 lines, >74% SWE-bench Verified
- Chain of Draft — 7.6% tokens, same accuracy
- CCoT — 48.7% shorter, negligible impact
# Add --terse to any command
forgegod --terse
forgegod run --terse "Build a REST API"
forgegod loop --terse --prd .forgegod/prd.json
forgegod plan --terse "Refactor auth module"
# Or enable globally in config
# .forgegod/config.toml
# [terse]
# enabled = trueCaveman mode compresses system prompts (~200 → ~80 tokens), tool descriptions (3-8 words each), and tool output (tracebacks → last frame only). JSON schemas for planner/reviewer stay byte-identical.
ForgeGod uses TOML config with 3-level priority: env vars > project > global.
Fresh forgegod init and forgegod auth sync write auth-aware defaults. The example below shows the file shape, not the only valid mapping.
# .forgegod/config.toml
[models]
planner = "openai:gpt-5.4" # Frontier planning
coder = "ollama:qwen3-coder-next" # Free local coding
reviewer = "openai:gpt-5.4" # Quality gate
sentinel = "openai:gpt-5.4" # Frontier sampling
escalation = "openai:gpt-5.4" # Fallback for hard problems
researcher = "openai:gpt-5.4-mini" # Recon / web synthesis
[budget]
daily_limit_usd = 5.00
mode = "normal"
[loop]
max_iterations = 100
parallel_workers = 2
gutter_detection = true
[ollama]
host = "http://localhost:11434"
model = "qwen3-coder-next"
[terse]
enabled = false # --terse flag or set true here
[security]
sandbox_mode = "standard" # permissive | standard | strict
sandbox_backend = "auto" # auto | docker
sandbox_image = "mcr.microsoft.com/devcontainers/python:1-3.13-bookworm"
redact_secrets = true
audit_commands = trueexport OPENAI_API_KEY="sk-..."
forgegod auth login openai-codex # Native ChatGPT-backed OpenAI auth
export ANTHROPIC_API_KEY="sk-ant-..." # Optional
export OPENROUTER_API_KEY="sk-or-..." # Optional
export GOOGLE_API_KEY="AIza..." # Optional (Gemini)
export DEEPSEEK_API_KEY="sk-..." # Optional
export MOONSHOT_API_KEY="sk-..." # Optional (Kimi / Moonshot)
export ZAI_CODING_API_KEY="..." # Optional (Z.AI Coding Plan)
export ZAI_API_KEY="..." # Optional (Z.AI general API)
export FORGEGOD_BUDGET_DAILY_LIMIT_USD=10| Provider | Models | Cost | Setup |
|---|---|---|---|
| Ollama | qwen3-coder-next, devstral, any | $0 | ollama serve |
| OpenAI API | gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o4-mini | $$ | OPENAI_API_KEY |
| OpenAI Codex subscription | gpt-5.4 via Codex auth surface | Included in supported ChatGPT plans | forgegod auth login openai-codex |
| Anthropic | claude-sonnet-4-6, claude-opus-4-6 | $$$ | ANTHROPIC_API_KEY |
| Google Gemini | gemini-2.5-pro, gemini-3-flash | $$ | GOOGLE_API_KEY |
| DeepSeek | deepseek-chat, deepseek-reasoner | $ | DEEPSEEK_API_KEY |
| Kimi (Moonshot direct) | kimi-k2.5, kimi-k2-thinking | $$ | MOONSHOT_API_KEY |
| Z.AI / GLM | glm-5.1, glm-5, glm-4.7 | $$ | ZAI_CODING_API_KEY or ZAI_API_KEY |
| OpenRouter | 200+ models | varies | OPENROUTER_API_KEY |
Kimi support uses Moonshot's official OpenAI-compatible API and is currently experimental in ForgeGod. Benchmark it on your workload before making it a default role.
OpenAI Codex subscription support is now a production-ready ForgeGod surface on supported native Windows and WSL installs. codex-only is the supported subscription-only path, and forgegod evals --matrix openai-live-compare is the release path for comparing it against api+codex when both auth surfaces are linked.
OpenRouter still uses keys/credits. Alibaba/Qwen Coding Plan is still under evaluation because current official docs scope it to supported coding tools rather than generic autonomous loops.
Harness rule of thumb:
forgegod benchmarkmeasures coding/model performance on scaffold tasksforgegod evalsmeasures ForgeGod itself: chat UX, approval behavior, permission denials, completion-gate discipline, loop/worktree behavior, and strict-sandbox interface handling. It also splits scores byux,safety,workflow, andverification, shipsforgegod evals --matrix openai-surfacesfor deterministic OpenAI-first routing coverage, andforgegod evals --matrix openai-livefor cheap real API/Codex probes when those auth surfaces are linked. Reports now include local trace graders too.
Run your own: forgegod benchmark
| Model | Composite | Correctness | Quality | Speed | Cost | Self-Repair |
|---|---|---|---|---|---|---|
| openai:gpt-4o-mini | 81.5 | 10/12 | 7.4 | 12s avg | $0.08 | 4/4 |
| ollama:qwen3.5:9b | 72.3 | 8/12 | 6.8 | 45s avg | $0.00 | 3/4 |
Run forgegod benchmark --update-readme to refresh with your own results.
forgegod/
├── cli.py # Typer CLI (init, run, loop, plan, review, cost, memory, status, benchmark, doctor)
├── config.py # TOML config + env vars + 3-level priority
├── router.py # Multi-provider LLM router + persistent pool + cascade routing + half-open circuit breaker
├── agent.py # Core agent loop (tools + context compression + sub-agents)
├── coder.py # Reflexion code generation (3 attempts, model escalation, GOAP)
├── loop.py # Ralph loop (24/7 autonomous coding, parallel workers, story timeout)
├── planner.py # Task decomposition → PRD
├── reviewer.py # Frontier model quality gate (sample-based)
├── sica.py # Self-improving strategy modification (guardrails + audit policy)
├── memory.py # 5-tier cognitive memory (FTS5 + RRF hybrid retrieval, WAL mode)
├── budget.py # SQLite cost + token tracking, forecasting, auto budget modes
├── worktree.py # Parallel git worktree workers
├── tui.py # Rich terminal dashboard
├── terse.py # Caveman mode — terse prompts, tool compression, savings tracker
���── benchmark.py # Model benchmarking engine (12 tasks, 4 tiers, composite scoring)
├── onboarding.py # Interactive setup wizard for new users
├── doctor.py # Installation health check (6 diagnostic checks)
├── i18n.py # Translation strings (English + Spanish es-419)
├── models.py # Pydantic v2 data models
└── tools/
├── filesystem.py # async read/write (aiofiles), atomic writes, fuzzy edit, glob, grep, repo_map
├── shell.py # bash (isolated runtime env + strict command policy + secret redaction)
├── git.py # git status, diff, commit, worktrees
├── mcp.py # MCP server client (5,800+ servers)
└── skills.py # On-demand skill loading
Defense-in-depth, not security theater:
- Real strict sandbox —
strictruns inside Docker with no network, read-only rootfs, dropped caps, and workspace-only mounts - Standard shell policy —
standardkeeps the local guardrails: isolated runtime dirs, blocked shell operators, and workspace scoping - Secret redaction — 11 patterns strip API keys from tool output before LLM context
- Prompt injection detection — 8 patterns scan for jailbreak/role-override attempts
- AST code validation — Detects obfuscated dangerous calls (
getattr(os, 'system')) that regex misses, and blocks suspicious writes instrictmode - Workspace-scoped file ops — file and shell tools reject paths that escape the active workspace root
- Supply chain defense — Flags known-abandoned/typosquat packages (python-jose, jeIlyfish, etc.)
- Canary token system — Detects if system prompt leaks into tool arguments, with per-session rotation
- Budget limits — Cost controls with token tracking + burn-rate forecasting
- Killswitch — Create
.forgegod/KILLSWITCHto immediately halt autonomous loops - Sensitive file protection —
.env, credentials files get warnings + automatic redaction
Warning: ForgeGod executes shell commands and modifies files. As of the verified 2026-04-08 baseline,
strictuses a real Docker sandbox backend and blocks if Docker/image prerequisites are missing, whilestandardremains a host-local guarded workflow. Useforgegod doctorand docs/STRICT_SANDBOX_SETUP.md instead of weakening the sandbox just to get past setup friction.
- AGENTS.md — repo-local instructions for coding agents
- docs/OPERATIONS.md — current system of record and verified commands
- docs/AUDIT_2026-04-07.md — detailed code audit and remediation order
- docs/WEB_RESEARCH_2026-04-07.md — external guidance used to shape the repo docs
See SECURITY.md for the full policy and vulnerability reporting.
We welcome contributions. See CONTRIBUTING.md for guidelines.
- Bug reports and feature requests: GitHub Issues
- Questions and discussion: GitHub Discussions
ForgeGod credits code and non-code work in public.
- Matias Mesa -
design- official ForgeGod mascot system - WAITDEAD -
code,infra,research,projectManagement,maintenance
See CONTRIBUTORS.md for the current contributor list.
Apache 2.0 — see LICENSE.
Built by WAITDEAD • Official mascot design by Matias Mesa • Powered by techniques from OpenClaw, Hermes, and SOTA 2026 coding agent research.
