Skip to content

waitdeadai/forgegod

English Español

ForgeGod official mascot

Official mascot design by Matias Mesa.

ForgeGod

The coding agent that runs 24/7, learns from its mistakes, and costs $0 when you want it to.

PyPI License Python 3.11+ CI Website Audit

23 built-in tools8 LLM providers5-tier memory24/7 autonomous$0 local mode


ForgeGod orchestrates multiple LLMs (OpenAI, Anthropic, Google Gemini, Ollama, OpenRouter, DeepSeek, Kimi via Moonshot, and Z.AI GLM) into a single autonomous coding engine. It routes tasks to the right model, runs 24/7 from a PRD, learns from every outcome, and self-improves its own strategy. Run it locally for $0 with Ollama, use cloud API keys when you need them, or connect native OpenAI Codex subscription auth and Z.AI Coding Plan inside the ForgeGod CLI.

pip install forgegod

Audit note (re-verified 2026-04-11): the verified baseline now includes 23 registered tools, 8 provider families, 9 route surfaces, 567 collected tests, 483 non-stress tests passing by default, 84/84 stress tests passing, green lint, and a green build. The strict Docker integration path remains opt-in and only runs when the local daemon is actually ready. The primary human entrypoint is now conversational forgegod; it auto-bootstraps repo-local config on first use, and it now honors the same runtime overrides as scripted surfaces, including --terse, model overrides, permission/approval flags, provider preference, and explicit OpenAI surface selection. forgegod run remains the explicit scripted surface, forgegod evals now covers deterministic chat, run, loop, worktree, and strict-interface regressions, splits scores by harness dimension, ships an OpenAI surfaces matrix, emits local trace-grader summaries, and now also offers an opt-in live OpenAI probe matrix plus a ranking matrix that recommends the best runnable OpenAI harness row when one exists. Native Windows Codex support is now a production-ready ForgeGod path when the official Codex CLI is installed and logged in. forgegod loop no longer auto-commits or auto-pushes by default. Read docs/AUDIT_2026-04-07.md, docs/OPERATIONS.md, docs/WEB_RESEARCH_2026-04-07.md, and docs/OPENAI_SURFACES_2026-04-10.md before making runtime changes.

What Makes ForgeGod Different

Every other coding CLI uses one model at a time and resets to zero each session. ForgeGod doesn't.

Capability Claude Code Codex CLI Aider Cursor ForgeGod
Multi-model auto-routing - - manual - yes
Local + cloud hybrid - basic basic - native
24/7 autonomous loops - - - - yes
Cross-session memory basic - - removed 5-tier
Self-improving strategy - - - - yes (SICA)
Cost-aware budget modes - - - - yes
Reflexion code generation - - - - 3-attempt
Parallel git worktrees subagents - - - experimental
Stress tested + benchmarked - - - - audited baseline

The Moat: Harness > Model

Scaffolding adds ~11 points on SWE-bench — harness engineering matters as much as the model. ForgeGod is the harness:

  • Ralph Loop — 24/7 coding from a PRD. Progress lives in git, not LLM context. Fresh agent per story. No context rot.
  • 5-Tier Memory — Episodic (what happened) + Semantic (what I know) + Procedural (how I do things) + Graph (how things connect) + Error-Solutions (what fixes what). Memories decay, consolidate, and reinforce automatically.
  • Reflexion Coder — 3-attempt code gen with escalating models: local (free) → cloud (cheap) → frontier (when it matters). The repo now wires workspace scoping, command auditing, blocked paths, and generated-code warnings into runtime, while the audit tracks the remaining hardening gaps.
  • DESIGN.md Native — Import a design preset, drop DESIGN.md in repo root, and frontend tasks inherit that design language automatically.
  • Natural-Language CLI — ForgeGod now explains what it is doing in plain language while it works, and the CLI surfaces share the same branded cyan/white/yellow UX instead of raw transport noise.
  • Contribution Mode — Read CONTRIBUTING.md, inspect the repo, surface approachable issues, and plan or execute contribution-sized changes with repo-specific guardrails.
  • SICA — Self-Improving Coding Agent. Modifies its own prompts, model routing, and strategy based on outcomes. Safety guardrails and audit policy keep that loop honest.
  • Budget Modesnormalthrottlelocal-onlyhalt. Auto-triggered by spend. Run forever on Ollama for $0.

Getting Started (No Coding Required)

You don't need to be a developer to use ForgeGod. If you can describe what you want in plain English, ForgeGod writes the code.

Option A: Free Local Mode ($0)

  1. Install Ollama: https://ollama.com/download
  2. Pull a model: ollama pull qwen3.5:9b
  3. Install ForgeGod: pip install forgegod
  4. Start the session: forgegod
  5. Say what you want naturally, for example: Create a simple website with a contact form
  6. If you want the guided wizard instead, run: forgegod init

Option B: OpenAI Native Subscription Mode

  1. Install ForgeGod: pip install forgegod
  2. Run: forgegod auth login openai-codex
  3. Inspect the OpenAI split you want: forgegod auth explain --profile adversarial --prefer-provider openai --openai-surface api+codex
  4. Run: forgegod auth sync --profile adversarial --prefer-provider openai --openai-surface api+codex
  5. Start the session: forgegod
  6. Say what you want naturally, for example: Build a REST API with user authentication

ForgeGod stays the entrypoint. It delegates the one-time login to the official Codex auth flow, then keeps day-to-day usage inside ForgeGod CLI.

Option C: Z.AI Coding Plan Mode

  1. Export ZAI_CODING_API_KEY=...
  2. Install ForgeGod: pip install forgegod
  3. Run: forgegod auth sync --profile adversarial
  4. Start the session: forgegod
  5. Say what you want naturally, for example: Build a REST API with user authentication

Recommended Experimental Harness: GLM-5.1 + Codex

For the strongest current subscription-backed setup inside ForgeGod, use:

  • planner = zai:glm-5.1
  • researcher = zai:glm-5.1
  • coder = zai:glm-5.1
  • reviewer = openai-codex:gpt-5.4
  • sentinel = openai-codex:gpt-5.4
  • escalation = openai-codex:gpt-5.4

See docs/GLM_CODEX_HARNESS_2026-04-08.md, docs/examples/glm_codex_coding_plan.toml, and run python scripts/smoke_glm_codex_harness.py before high-stakes use.

This harness is research-backed and works in ForgeGod today. The ZAI_CODING_API_KEY path should still be treated as experimental and at-your-own-risk until Z.AI explicitly recognizes ForgeGod as a supported coding tool.

OpenAI Surface Modes

If you want ForgeGod to stay inside OpenAI surfaces, apply an explicit surface mode:

  • planner = openai:gpt-5.4
  • coder = openai:gpt-5.4-mini
  • reviewer = openai-codex:gpt-5.4
  • sentinel = openai:gpt-5.4
  • escalation = openai:gpt-5.4
  • researcher = openai:gpt-5.4-mini
forgegod auth explain --profile adversarial --prefer-provider openai --openai-surface api+codex
forgegod auth sync --profile adversarial --prefer-provider openai --openai-surface api+codex

ForgeGod now supports four explicit OpenAI surface modes:

  • auto
  • api-only
  • codex-only
  • api+codex

api+codex keeps the adversarial split but makes the contract explicit: OpenAI API handles builder/research roles while Codex subscription handles the reviewer path when both are connected. ChatGPT/Codex subscription access and OpenAI API billing remain separate surfaces.

If you want a simpler setup, ForgeGod also supports single-model mode during forgegod init and forgegod auth sync --profile single-model. That pins all roles to one detected model instead of using the recommended adversarial split.

forgegod is now the primary conversational entrypoint for humans, and it auto-creates local project config on first use. Use forgegod init when you want the guided wizard, and use forgegod run "..." when you need a deterministic, non-interactive command for scripts, CI, or reproducible automation. The same root entrypoint also accepts session overrides such as --terse, --model, --review/--no-review, --permission-mode, --approval-mode, and repeated --allow-tool values.

Something not working?

Run forgegod doctor — it checks your setup and tells you exactly what to fix.

If you want the real strict sandbox, read docs/STRICT_SANDBOX_SETUP.md. It explains Docker Desktop, the required sandbox image, and the safe fix path in non-technical terms.

Quickstart

# Install
pip install forgegod

# Fastest path: talk to ForgeGod directly
forgegod

# Optional guided project setup
forgegod init

# Or force one harness style explicitly
forgegod init --profile adversarial
forgegod init --profile single-model

# Check native auth surfaces
forgegod auth status

# Link ChatGPT-backed OpenAI Codex subscription, then sync config defaults
forgegod auth login openai-codex
forgegod auth explain --profile adversarial --prefer-provider openai --openai-surface api+codex
forgegod auth sync --profile adversarial --prefer-provider openai --openai-surface api+codex

# Talk to ForgeGod in natural language
forgegod

# Explicit scripted task surface
forgegod run "Add a /health endpoint to server.py with uptime and version info"

# Deterministic harness evals
forgegod evals
forgegod evals --case chat_natural_language_roundtrip
forgegod evals --matrix openai-surfaces
forgegod evals --matrix openai-live
forgegod evals --matrix openai-live-compare

# Plan a project → generates PRD
forgegod plan "Build a REST API for a todo app with auth, CRUD, and tests"

# 24/7 autonomous loop from PRD
# Loop defaults: no auto-commit or auto-push unless you explicitly enable those flags
# Parallel workers require a git repo with at least one commit because ForgeGod uses isolated worktrees
forgegod loop --prd .forgegod/prd.json

# Caveman mode — 50-75% token savings with ultra-terse prompts
forgegod --terse

# Check what it learned
forgegod memory

# View cost breakdown
forgegod cost

# Benchmark your models
forgegod benchmark

# Evaluate ForgeGod itself
forgegod evals

# Install a DESIGN.md preset for frontend work
forgegod design pull claude

# Plan a contribution against another repo
forgegod contribute https://github.com/owner/repo --goal "Improve tests"

# Health check
forgegod doctor

Zero-Config Start

ForgeGod auto-detects your environment on first run:

  1. Finds API keys in env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, GOOGLE_API_KEY / GEMINI_API_KEY, DEEPSEEK_API_KEY, MOONSHOT_API_KEY, ZAI_CODING_API_KEY, ZAI_API_KEY) and detects native OpenAI Codex login state
  2. Checks if Ollama is running locally
  3. Detects your project language, test framework, and linter
  4. Picks auth-aware model defaults for each role based on what's available
  5. Lets you choose adversarial (recommended) or single-model, plus auto or openai provider preference
  6. Lets you choose OpenAI auto, api-only, codex-only, or api+codex surfaces when you want explicit OpenAI behavior
  7. Creates .forgegod/config.toml with sensible defaults

No manual setup required. Just run forgegod and go.

On first conversational use, ForgeGod auto-creates .forgegod/config.toml with auth-aware defaults when it can detect usable model surfaces. If you want the guided wizard or explicit profile selection up front, run forgegod init.

If you add a new provider later, run forgegod auth sync --profile adversarial or forgegod auth sync --profile single-model to rewrite model defaults from detected auth surfaces.

How the Ralph Loop Works

┌─────────────────────────────────────────────────┐
│                  RALPH LOOP                      │
│                                                  │
│  ┌──────┐   ┌───────┐   ┌─────────┐   ┌─────┐ │
│  │ READ │──▶│ SPAWN │──▶│ EXECUTE │──▶│ VAL │ │
│  │ PRD  │   │ AGENT │   │  STORY  │   │IDATE│ │
│  └──────┘   └───────┘   └─────────┘   └──┬──┘ │
│      ▲                                    │     │
│      │         ┌────────┐    ┌────────┐   │     │
│      └─────────│ROTATE  │◀───│COMMIT  │◀──┘     │
│                │CONTEXT │    │OR RETRY│   pass   │
│                └────────┘    └────────┘          │
│                                                  │
│  Progress is in GIT, not LLM context.           │
│  Fresh agent per story. No context rot.          │
│  Create .forgegod/KILLSWITCH to stop.           │
└─────────────────────────────────────────────────┘
  1. Read PRD — Pick highest-priority TODO story
  2. Spawn agent — Fresh context (progress is in git, not memory)
  3. Execute — Agent uses 23 tools to implement the story
  4. Validate — Tests, lint, syntax, frontier review
  5. Finalize or retry — Pass: review diff + mark done. Fail: retry up to 3x with model escalation
  6. Rotate — Next story. Context is always fresh.

5-Tier Memory System

ForgeGod has the most advanced memory system of any open-source coding agent:

Tier What How Retention
Episodic What happened per task Full outcome records 90 days
Semantic Extracted principles Confidence + decay + reinforcement Indefinite
Procedural Code patterns & fix recipes Success rate tracking Indefinite
Graph Entity relationships + causal edges Auto-extracted from outcomes Indefinite
Error-Solution Error pattern → fix mapping Fuzzy match lookup Indefinite

Memories decay with category-specific half-life (14d debugging → 90d architecture), consolidate via O(n*k) category-bucketed comparison, and are recalled via FTS5 + Jaccard hybrid retrieval (Reciprocal Rank Fusion). SQLite WAL mode for concurrent access.

# Check memory health
forgegod memory

# Memory is stored in .forgegod/memory.db (SQLite)
# Global learnings in ~/.forgegod/memory.db (cross-project)

Budget Modes

Mode Behavior Trigger
normal Use all configured models Default
throttle Prefer local, cloud for review only 80% of daily limit
local-only Ollama only, $0 operation Manual or 95% limit
halt Stop all LLM calls 100% of daily limit
# Check spend
forgegod cost

# Override mode
export FORGEGOD_BUDGET_MODE=local-only

Caveman Mode (--terse)

Ultra-terse prompts that reduce token usage 50-75% with no accuracy loss for coding tasks. Backed by 2026 research:

# Add --terse to any command
forgegod --terse
forgegod run --terse "Build a REST API"
forgegod loop --terse --prd .forgegod/prd.json
forgegod plan --terse "Refactor auth module"

# Or enable globally in config
# .forgegod/config.toml
# [terse]
# enabled = true

Caveman mode compresses system prompts (~200 → ~80 tokens), tool descriptions (3-8 words each), and tool output (tracebacks → last frame only). JSON schemas for planner/reviewer stay byte-identical.

Configuration

ForgeGod uses TOML config with 3-level priority: env vars > project > global.

Fresh forgegod init and forgegod auth sync write auth-aware defaults. The example below shows the file shape, not the only valid mapping.

# .forgegod/config.toml

[models]
planner = "openai:gpt-5.4"            # Frontier planning
coder = "ollama:qwen3-coder-next"     # Free local coding
reviewer = "openai:gpt-5.4"           # Quality gate
sentinel = "openai:gpt-5.4"           # Frontier sampling
escalation = "openai:gpt-5.4"         # Fallback for hard problems
researcher = "openai:gpt-5.4-mini"    # Recon / web synthesis

[budget]
daily_limit_usd = 5.00
mode = "normal"

[loop]
max_iterations = 100
parallel_workers = 2
gutter_detection = true

[ollama]
host = "http://localhost:11434"
model = "qwen3-coder-next"

[terse]
enabled = false              # --terse flag or set true here

[security]
sandbox_mode = "standard"    # permissive | standard | strict
sandbox_backend = "auto"     # auto | docker
sandbox_image = "mcr.microsoft.com/devcontainers/python:1-3.13-bookworm"
redact_secrets = true
audit_commands = true

Environment Variables

export OPENAI_API_KEY="sk-..."
forgegod auth login openai-codex           # Native ChatGPT-backed OpenAI auth
export ANTHROPIC_API_KEY="sk-ant-..."     # Optional
export OPENROUTER_API_KEY="sk-or-..."     # Optional
export GOOGLE_API_KEY="AIza..."           # Optional (Gemini)
export DEEPSEEK_API_KEY="sk-..."          # Optional
export MOONSHOT_API_KEY="sk-..."          # Optional (Kimi / Moonshot)
export ZAI_CODING_API_KEY="..."           # Optional (Z.AI Coding Plan)
export ZAI_API_KEY="..."                  # Optional (Z.AI general API)
export FORGEGOD_BUDGET_DAILY_LIMIT_USD=10

Supported Models

Provider Models Cost Setup
Ollama qwen3-coder-next, devstral, any $0 ollama serve
OpenAI API gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o4-mini $$ OPENAI_API_KEY
OpenAI Codex subscription gpt-5.4 via Codex auth surface Included in supported ChatGPT plans forgegod auth login openai-codex
Anthropic claude-sonnet-4-6, claude-opus-4-6 $$$ ANTHROPIC_API_KEY
Google Gemini gemini-2.5-pro, gemini-3-flash $$ GOOGLE_API_KEY
DeepSeek deepseek-chat, deepseek-reasoner $ DEEPSEEK_API_KEY
Kimi (Moonshot direct) kimi-k2.5, kimi-k2-thinking $$ MOONSHOT_API_KEY
Z.AI / GLM glm-5.1, glm-5, glm-4.7 $$ ZAI_CODING_API_KEY or ZAI_API_KEY
OpenRouter 200+ models varies OPENROUTER_API_KEY

Kimi support uses Moonshot's official OpenAI-compatible API and is currently experimental in ForgeGod. Benchmark it on your workload before making it a default role. OpenAI Codex subscription support is now a production-ready ForgeGod surface on supported native Windows and WSL installs. codex-only is the supported subscription-only path, and forgegod evals --matrix openai-live-compare is the release path for comparing it against api+codex when both auth surfaces are linked. OpenRouter still uses keys/credits. Alibaba/Qwen Coding Plan is still under evaluation because current official docs scope it to supported coding tools rather than generic autonomous loops.

Harness rule of thumb:

  • forgegod benchmark measures coding/model performance on scaffold tasks
  • forgegod evals measures ForgeGod itself: chat UX, approval behavior, permission denials, completion-gate discipline, loop/worktree behavior, and strict-sandbox interface handling. It also splits scores by ux, safety, workflow, and verification, ships forgegod evals --matrix openai-surfaces for deterministic OpenAI-first routing coverage, and forgegod evals --matrix openai-live for cheap real API/Codex probes when those auth surfaces are linked. Reports now include local trace graders too.

Model Leaderboard

Run your own: forgegod benchmark

Model Composite Correctness Quality Speed Cost Self-Repair
openai:gpt-4o-mini 81.5 10/12 7.4 12s avg $0.08 4/4
ollama:qwen3.5:9b 72.3 8/12 6.8 45s avg $0.00 3/4

Run forgegod benchmark --update-readme to refresh with your own results.

Architecture

forgegod/
├── cli.py          # Typer CLI (init, run, loop, plan, review, cost, memory, status, benchmark, doctor)
├── config.py       # TOML config + env vars + 3-level priority
├── router.py       # Multi-provider LLM router + persistent pool + cascade routing + half-open circuit breaker
├── agent.py        # Core agent loop (tools + context compression + sub-agents)
├── coder.py        # Reflexion code generation (3 attempts, model escalation, GOAP)
├── loop.py         # Ralph loop (24/7 autonomous coding, parallel workers, story timeout)
├── planner.py      # Task decomposition → PRD
├── reviewer.py     # Frontier model quality gate (sample-based)
├── sica.py         # Self-improving strategy modification (guardrails + audit policy)
├── memory.py       # 5-tier cognitive memory (FTS5 + RRF hybrid retrieval, WAL mode)
├── budget.py       # SQLite cost + token tracking, forecasting, auto budget modes
├── worktree.py     # Parallel git worktree workers
├── tui.py          # Rich terminal dashboard
├── terse.py        # Caveman mode — terse prompts, tool compression, savings tracker
���── benchmark.py    # Model benchmarking engine (12 tasks, 4 tiers, composite scoring)
├── onboarding.py   # Interactive setup wizard for new users
├── doctor.py       # Installation health check (6 diagnostic checks)
├── i18n.py         # Translation strings (English + Spanish es-419)
├── models.py       # Pydantic v2 data models
└── tools/
    ├── filesystem.py  # async read/write (aiofiles), atomic writes, fuzzy edit, glob, grep, repo_map
    ├── shell.py       # bash (isolated runtime env + strict command policy + secret redaction)
    ├── git.py         # git status, diff, commit, worktrees
    ├── mcp.py         # MCP server client (5,800+ servers)
    └── skills.py      # On-demand skill loading

Security

Defense-in-depth, not security theater:

  • Real strict sandboxstrict runs inside Docker with no network, read-only rootfs, dropped caps, and workspace-only mounts
  • Standard shell policystandard keeps the local guardrails: isolated runtime dirs, blocked shell operators, and workspace scoping
  • Secret redaction — 11 patterns strip API keys from tool output before LLM context
  • Prompt injection detection — 8 patterns scan for jailbreak/role-override attempts
  • AST code validation — Detects obfuscated dangerous calls (getattr(os, 'system')) that regex misses, and blocks suspicious writes in strict mode
  • Workspace-scoped file ops — file and shell tools reject paths that escape the active workspace root
  • Supply chain defense — Flags known-abandoned/typosquat packages (python-jose, jeIlyfish, etc.)
  • Canary token system — Detects if system prompt leaks into tool arguments, with per-session rotation
  • Budget limits — Cost controls with token tracking + burn-rate forecasting
  • Killswitch — Create .forgegod/KILLSWITCH to immediately halt autonomous loops
  • Sensitive file protection.env, credentials files get warnings + automatic redaction

Warning: ForgeGod executes shell commands and modifies files. As of the verified 2026-04-08 baseline, strict uses a real Docker sandbox backend and blocks if Docker/image prerequisites are missing, while standard remains a host-local guarded workflow. Use forgegod doctor and docs/STRICT_SANDBOX_SETUP.md instead of weakening the sandbox just to get past setup friction.

Operational Docs

See SECURITY.md for the full policy and vulnerability reporting.

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

Contributors

ForgeGod credits code and non-code work in public.

  • Matias Mesa - design - official ForgeGod mascot system
  • WAITDEAD - code, infra, research, projectManagement, maintenance

See CONTRIBUTORS.md for the current contributor list.

License

Apache 2.0 — see LICENSE.


Built by WAITDEAD • Official mascot design by Matias Mesa • Powered by techniques from OpenClaw, Hermes, and SOTA 2026 coding agent research.

About

Autonomous coding agent with web research (Recon), adversarial plan debate, 5-tier cognitive memory, multi-model routing (Gemini + DeepSeek + Ollama), 24/7 loops, and $0 local mode. Apache 2.0.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors