Claude Code is the primary integration target for CCE. Setup creates MCP configuration, an instruction file, and optional session hooks.
+cce init --agent claudeThis creates or updates the following files in your project root.
+.mcp.jsonRegisters the CCE MCP server so Claude Code can call tools like context_search.
{ "mcpServers": { "context-engine": { "command": "cce", "args": ["serve"] } }}CLAUDE.mdContains instructions telling Claude to use context_search for code questions instead of reading files directly. The CCE block is wrapped in markers:
<!-- CCE:BEGIN -->...instructions...<!-- CCE:END -->You can add your own content above or below the markers. CCE will only update the section between them during upgrades.
+CCE installs a SessionStart hook that prints a one-line status summary at the beginning of each Claude Code session:
CCE v0.4.20 · my-project · 1247 chunks indexed · 68% saved over 42 queriesUSE context_search MCP tool for all code questions. Do NOT use Read/Grep to explore code.This reminds Claude to use CCE from the very first message.
+CCE can optionally install a memory hook that captures architectural decisions and important code areas discovered during a session. These are stored locally and surfaced in subsequent sessions for continuity.
+The dashboard’s “Sessions” view shows all captured memories across sessions.
+Three git hooks are installed:
+post-commit runs cce index to keep the index fresh.post-checkout runs cce index after branch switches.post-merge runs cce index after merges.Each hook contains a CCE marker comment so cce uninstall can cleanly remove them.
Codex CLI uses a global configuration file rather than per-project MCP config. CCE registers itself in the user-level config with a project-specific section.
+cce init --agent codex~/.codex/config.tomlCodex CLI has no per-project MCP configuration. Instead, CCE adds a project section (keyed by a hash of the project path) to the user-global config file.
+[projects."a1b2c3d4"]path = "/Users/you/projects/my-project"
+[projects."a1b2c3d4".mcpServers.context-engine]command = "cce"args = ["serve"]AGENTS.mdContains instructions for Codex to use context_search for code exploration. The CCE block is wrapped in markers so your own content is preserved during upgrades.
.mcp.json files. The global ~/.codex/config.toml is the only location for MCP server registration.cce uninstall removes only the section for the current project.CCE integrates with GitHub Copilot’s chat agent in VS Code through MCP configuration and a Copilot instructions file.
+cce init --agent copilot.vscode/mcp.jsonRegisters the CCE MCP server for Copilot’s agent mode.
+{ "mcpServers": { "context-engine": { "command": "cce", "args": ["serve"] } }}.github/copilot-instructions.mdContains instructions for Copilot to use context_search for code questions. The CCE block is wrapped in markers:
<!-- CCE:BEGIN -->...instructions...<!-- CCE:END -->Your own Copilot instructions above or below the markers are preserved during upgrades.
+ +Once configured, Copilot’s chat agent will have access to the context_search tool. Ask questions about your codebase in Copilot Chat and it will use CCE’s compressed retrieval instead of sending full files.
After running cce init, reload the VS Code window (Cmd+Shift+P, then “Developer: Reload Window”) to pick up the MCP server.
Cursor has its own built-in codebase indexing, but CCE adds compressed retrieval and token savings tracking on top.
+cce init # Auto-detects Cursor if .cursor/ existscce init --agent all # Explicitly includes Cursor.cursor/mcp.jsonRegisters the CCE MCP server for Cursor’s agent mode.
+{ "mcpServers": { "context-engine": { "command": "cce", "args": ["serve"] } }}.cursorrulesContains instructions for Cursor’s AI to prefer context_search over raw file reads. The CCE block is wrapped in markers so your own rules are preserved.
Cursor indexes your codebase for its own retrieval. CCE complements this by:
+Both systems can run side by side without conflict.
+After running cce init, restart Cursor to pick up the new MCP server configuration.
CCE integrates with the Gemini CLI through its settings file and an instruction file.
+cce init # Auto-detects Gemini CLI if .gemini/ existscce init --agent gemini.gemini/settings.jsonRegisters the CCE MCP server for Gemini CLI.
+{ "mcpServers": { "context-engine": { "command": "cce", "args": ["serve"] } }}GEMINI.mdContains instructions for Gemini to prefer context_search over reading files directly. The CCE block is wrapped in markers so your own content is preserved.
CCE detects Gemini CLI when a .gemini/ directory exists in your project root or home directory. No explicit --agent flag is needed if the directory is present.
OpenCode uses a single opencode.json file in the project root for all configuration, including MCP servers.
cce init # Auto-detects OpenCode if opencode.json existscce init --agent all # Explicitly includes OpenCodeopencode.jsonCCE adds its MCP server entry to the existing opencode.json (or creates one if it does not exist).
{ "mcpServers": { "context-engine": { "command": "cce", "args": ["serve"] } }}OpenCode does not use a separate instruction file. The MCP server registration is sufficient for OpenCode to discover and use CCE’s tools.
+CCE detects OpenCode when an opencode.json file exists in your project root. No explicit --agent flag is needed.
Code Context Engine works with any AI coding agent that supports MCP (Model Context Protocol). The cce init command auto-detects which agents are present in your environment and configures them automatically.
--agent flagcce init --agent auto # Default. Detects installed agents.cce init --agent claude # Configure only Claude Codecce init --agent cursor # Configure only Cursorcce init --agent copilot # Configure only VS Code / Copilotcce init --agent gemini # Configure only Gemini CLIcce init --agent codex # Configure only Codex CLIcce init --agent all # Configure all supported agentsWhen no --agent flag is provided, cce init defaults to auto, which scans for known config files and editors.
| Agent | MCP Config Path | Instruction File |
|---|---|---|
| Claude Code | .mcp.json | CLAUDE.md |
| Cursor | .cursor/mcp.json | .cursorrules |
| VS Code / Copilot | .vscode/mcp.json | .github/copilot-instructions.md |
| Gemini CLI | .gemini/settings.json | GEMINI.md |
| Codex CLI | ~/.codex/config.toml (global) | AGENTS.md |
| OpenCode | opencode.json | (none) |
| Tabnine | .tabnine/agent/settings.json | TABNINE.md |
Each agent integration does two things:
+context_search and other CCE tools.The instruction file content is managed by CCE and wrapped in markers (CCE:BEGIN / CCE:END) so it can be updated on upgrade without touching your own content.
You can run cce init --agent <name> multiple times. Each run is additive and will not remove previously configured agents.
cce init --agent claudecce init --agent copilot # Adds Copilot config alongside ClaudeOr configure everything at once:
+cce init --agent allTabnine uses a project-local settings file and an instruction file for MCP integration.
+cce init # Auto-detects Tabnine if .tabnine/ existscce init --agent all # Explicitly includes Tabnine.tabnine/agent/settings.jsonRegisters the CCE MCP server for Tabnine’s agent.
+{ "mcpServers": { "context-engine": { "command": "cce", "args": ["serve"] } }}TABNINE.mdContains instructions for Tabnine to prefer context_search for code retrieval. The CCE block is wrapped in markers so your own content is preserved.
CCE detects Tabnine when a .tabnine/ directory exists in your project root. No explicit --agent flag is needed.
One-time setup for a project. Checks dependencies, indexes all code, installs git hooks, and connects AI coding agents via MCP.
+cce initcce init --agent claudecce init --agent copilotcce init --agent codexcce init --agent allWhat it does:
+post-commit, post-checkout, and post-merge git hooks..gitignore.Re-index files that have changed since the last run.
+cce index # Incremental (changed files only)cce index --full # Force full re-index of every filecce index --path src/ # Index a specific file or directorycce index -v # Verbose outputThe git hooks installed by cce init call cce index automatically after every commit.
Show index health and token savings summary.
+cce status # Full statuscce status --oneline # Single line (used by SessionStart hook)cce status --json # Machine-readable outputcce status -v # Lists all indexed projectsToken savings report with cost estimates.
+cce savings # Current projectcce savings --all # All indexed projectscce savings --json # Machine-readable outputRun a test query against the index and display results.
+cce search 'how does authentication work'cce search 'payment processing' --top-k 10Also updates savings stats, useful for populating the dashboard before opening an agent session.
+Open the web dashboard in your browser.
+cce dashboardcce dashboard --port 8080cce dashboard --no-browserThe dashboard provides views for: overview, files, sessions, and savings.
+Manage Ollama and the dashboard as background processes.
+cce services # Show statuscce services start # Start Ollama + dashboardcce services start ollama # Start only Ollamacce services start dashboard # Start dashboardcce services start dashboard --port 9000cce services stop # Stop everything CCE startedcce services stop dashboard # Stop only dashboardShortcuts for cce services start and cce services stop.
cce start # Start all servicescce stop # Stop all servicescce start ollama # Start only Ollamacce stop dashboard # Stop only dashboardManage per-project rules, preferences, and shell hooks.
+cce commands list # Show all rules and hookscce commands add-rule 'Use UUID for PKs' # Add a rulecce commands remove-rule 'Use UUID for PKs'cce commands set-pref database PostgreSQL # Set a preferencecce commands remove-pref databasecce commands add before_push 'npm test' # Add hook commandcce commands remove before_push 'npm test'cce commands add-custom deploy 'kubectl apply -f k8s/'Clear all index data for the current project.
+cce clear # Asks for confirmationcce clear --yes # Skip confirmationAfter clearing, run cce index --full to rebuild.
Remove index data for projects whose directories no longer exist on disk.
+cce prune # Remove stale project datacce prune --dry-run # Preview without deletingUpgrade CCE to the latest version. Detects your install method (uv, pipx, or pip) and runs the correct upgrade command. Refreshes project config afterwards.
+cce upgrade # Upgrade and refresh configcce upgrade --check # Show install method without upgradingRemove CCE from the current project. Reverses everything cce init did.
cce uninstallRemoves: git hooks, MCP config entry, instruction file block, and .cce/ directory. Index data in ~/.cce is preserved (use cce clear to remove it).
Start the MCP server. Called automatically by agents via .mcp.json. You do not need to run this manually.
cce servecce serve --project-dir /path/to/projectShow every available command grouped by category.
+cce listcce --version # Show versioncce --help # Show helpCCE works with zero configuration out of the box. This page covers all available options for when you need to tune behavior.
+~/.cce/config.yaml (created automatically on first use).context-engine.yaml in your project root (overrides global for that project)compression: level: standard # How much to compress code chunks before sending to the agent # Options: minimal | standard | full output: standard # How much to compress agent responses # Options: off | lite | standard | max model: phi3:mini # Ollama model for LLM-based summarization # Auto-detected if Ollama is running. Ignored if Ollama is off.
+indexer: watch: true # Keep index in sync via git hooks ignore: # Directories and patterns to skip during indexing - .git - node_modules - __pycache__ - .venv - dist - build
+retrieval: top_k: 20 # Maximum chunks returned per query confidence_threshold: 0.5 # Minimum score to include a result (0.0 to 1.0)
+embedding: model: BAAI/bge-small-en-v1.5 # Embedding model (fastembed-compatible)
+pricing: model: opus # Model for cost estimates in `cce savings` # Options: opus | sonnet | haikucompression.level)Controls how much CCE compresses code chunks before including them in the agent’s context.
+ + + + + + + + + + + + + + + + + + + + + +| Level | Behavior |
|---|---|
minimal | Truncation only. Keeps signature and docstring, drops body. |
standard | Truncation plus light summarization if Ollama is available. |
full | Full LLM summarization via Ollama (requires Ollama running). |
compression.output)Controls how verbose the agent’s responses are. Set via the set_output_compression MCP tool or config.
| Level | Style | Typical savings |
|---|---|---|
off | Full output | 0% |
lite | Removes filler and hedging | ~30% |
standard | Shorter phrasing, fragments where possible | ~65% |
max | Telegraphic, minimal prose | ~75% |
Code blocks, file paths, commands, and error messages are never compressed regardless of level.
+Change at runtime by telling your agent:
+Switch to max output compressionTurn off output compressionembedding: model: sentence-transformers/all-mpnet-base-v2Any model available in fastembed works. Changing the model requires a full re-index:
+cce clear --yes && cce index --fullThe default BAAI/bge-small-en-v1.5 is recommended for most use cases. It balances quality, speed, and size well.
top_k controls how many chunks the retriever returns per query. Higher values surface more context but cost more tokens. Default: 20.
confidence_threshold sets the minimum score to include a result. Range 0.0 to 1.0. Lower values return more results; higher values return only strong matches. Default: 0.5.
At runtime, the agent can pass top_k and max_tokens directly to context_search:
context_search(query="payment processing", top_k=5, max_tokens=3000)The indexer.ignore list supports:
node_modules, dist"*.generated.ts", "*.min.js""src/legacy/"Files matching .gitignore are also skipped automatically.
pricing: model: sonnet # opus (default) | sonnet | haikuThis determines which model’s pricing is used for cost estimates in cce savings. Prices are fetched from Anthropic’s docs and cached for 7 days.
If Ollama is running on a non-default address, set it via environment variable:
+export OLLAMA_HOST=http://localhost:11434CCE auto-detects available RAM and adjusts behavior:
+ + + + + + + + + + + + + + + + + + + + + + + + + +| RAM | Profile | Behavior |
|---|---|---|
| Less than 12 GB | light | Truncation only, small embedding batches |
| 12 to 32 GB | standard | Full pipeline, standard batch sizes |
| More than 32 GB | full | Larger Ollama models, larger batches |
You do not need to set this manually.
+~/.cce/projects/.No. CCE returns the same code your agent would find by reading files, just compressed and targeted. In practice, answers are often better because the agent receives focused, relevant context instead of entire files full of unrelated code.
+Set output compression to a higher level:
+compression: output: maxOr tell your agent at runtime: “Switch to max output compression.” The max level uses telegraphic phrasing and typically saves ~75% on response tokens. Code blocks and file paths are never affected.
Three main sources:
+No. All processing happens locally:
+No code, embeddings, or queries leave your machine unless you explicitly configure a remote embedding model.
+Yes, fully. After the initial setup (which downloads the embedding model, ~60 MB), CCE operates entirely offline. Ollama summarization also runs locally if you have it installed.
+The only network call CCE makes is fetching model pricing for cost estimates in cce savings, and that result is cached for 7 days.
CCE uses Tree-sitter for structural parsing. The following languages have full AST-aware chunking:
+Other file types (YAML, Markdown, config files, etc.) are indexed using line-based chunking. They still appear in search results but without function-level granularity.
+Yes. Run cce init --agent all to configure every supported agent. They all share the same index and MCP server, so there is no duplication or conflict.
cce upgradeThis detects your install method (uv, pipx, or pip), upgrades the package, and refreshes your project config (hooks, MCP config, instruction files).
+cce uninstallThis removes git hooks, MCP config entries, instruction file blocks, and the local .cce/ directory. Index data in ~/.cce is preserved. Run cce clear afterwards to remove that too.
Savings are recorded when your agent calls context_search through the MCP server. If you have not used an agent session yet, run a test search to seed the stats:
cce search 'main entry point'cmake (needed to build tree-sitter grammars)| Platform | Setup |
|---|---|
| macOS | xcode-select --install |
| Ubuntu/Debian | sudo apt install build-essential cmake |
| Fedora/RHEL | sudo dnf install gcc gcc-c++ cmake |
| Windows | Visual Studio Build Tools (C++ workload) + CMake |
uv tool install code-context-engineOr with pipx:
+pipx install code-context-engineuv tool install "code-context-engine[local]" # includes fastembed + ONNX Runtimecd /path/to/your/projectcce initThis does everything:
+cce init --agent claude # Claude Code onlycce init --agent codex # Codex CLI onlycce init --agent copilot # VS Code / Copilot onlycce init --agent all # Every supported editorRestart your editor, then ask a question about your code. The agent will call context_search via MCP instead of reading files.
Check your savings:
+cce savings my-project · 5 queries
+ ⛁ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ 93% tokens saved
+ Input savings 42.1k tokens $0.63 Output savings 1.2k tokens $0.09 ────────────────────────────────────────── Total saved 43.3k tokens $0.72CCE auto-detects the best available backend:
+nomic-embed-text. Zero extra dependencies.[local] extra. Uses BAAI/bge-small-en-v1.5. Works offline, ~60 MB download.Set CCE_EMBED_BACKEND=ollama or CCE_EMBED_BACKEND=fastembed to force a specific backend.
CCE sits between your AI coding agent and your codebase. It replaces full-file reads with compressed, relevant chunks, reducing token usage while preserving answer quality.
+When you run cce init or cce index, the following steps execute:
Tree-sitter parsing. Each source file is parsed into an AST using language-specific Tree-sitter grammars. This identifies functions, classes, methods, and other structural units.
+Chunking. The AST is split into semantic chunks (one per function, class, or logical block). Each chunk retains its file path, line range, and relationships to other chunks.
+Embedding. Each chunk is embedded using a local model (default: BAAI/bge-small-en-v1.5 via fastembed). No data leaves your machine.
Storage. Embeddings, full-text content, and graph edges are written to a local SQLite database with sqlite-vec for vector search and FTS5 for keyword search.
+When an agent calls context_search, the following steps execute:
Query embedding. The natural language query is embedded using the same model.
+Hybrid retrieval. Two searches run in parallel:
+RRF merge. Results from both searches are combined using Reciprocal Rank Fusion, which produces a single ranked list without needing score normalization.
+Graph expansion. Top results are expanded by following code relationships (calls, imports, inheritance) to pull in related chunks the query might not have matched directly.
+Compression. The final chunk set is compressed before being returned to the agent.
+All index data lives in ~/.cce/projects/<project-name>/:
Everything is SQLite. No external database required.
+CCE applies multiple compression stages to minimize tokens while preserving usefulness:
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +| Layer | What it does |
|---|---|
| Retrieval | Only relevant chunks are returned (not the whole codebase). |
| Chunk compression | Function bodies are truncated to signature + docstring, or summarized via Ollama. |
| Output compression | Agent responses are made more concise (configurable level). |
| Grammar compression | Removes syntactic noise (extra whitespace, redundant type annotations) from returned code. |
| Turn summarization | Long conversation histories are summarized to reduce context window usage. |
| Progressive disclosure | Returns signatures first; the agent can request full bodies only when needed. |
Tree-sitter grammars are included for:
+Other file types are indexed using line-based chunking without AST awareness.
Code Context Engine (CCE) is a local MCP server that indexes your codebase so AI coding agents search for relevant code instead of reading entire files.
+Every time an AI agent needs to understand your code, it reads entire files. A 500-line file costs 500 lines of input tokens even when the agent only needs one function. Across a session, this adds up to thousands of wasted tokens and real dollars.
+CCE parses your code into semantic chunks (functions, classes, modules) using Tree-sitter, stores them with vector embeddings, and serves only the relevant pieces when the agent asks a question.
+Result: 94% input token savings, reproducibly benchmarked.
+| Tool | Purpose |
|---|---|
context_search | Hybrid vector + keyword search with graph expansion |
get_chunk | Retrieve a specific chunk by ID |
record_decision | Store architectural decisions for cross-session recall |
record_code_area | Mark areas you’ve worked on |
session_recall | Recall decisions and code areas |
session_timeline | Browse tool call history |
session_event | Inspect a specific past event |
set_output_level | Control output compression (off/lite/standard/max) |
set_scope | Limit search to specific directories |
| Editor | Config written | Instructions |
|---|---|---|
| Claude Code | .mcp.json | CLAUDE.md |
| VS Code / Copilot | .vscode/mcp.json | .github/copilot-instructions.md |
| Cursor | .cursor/mcp.json | .cursorrules |
| Gemini CLI | .gemini/settings.json | GEMINI.md |
| OpenAI Codex | ~/.codex/config.toml | AGENTS.md |
| OpenCode | opencode.json | |
| Tabnine | .tabnine/agent/settings.json | TABNINE.md |
context_search via MCP. Hybrid vector + BM25 merged with Reciprocal Rank Fusion. Graph expansion adds related imports.cce savings shows tokens and dollars saved.+ {{ meta.title }} +
+ {{#if excerpt}} +{{+ excerpt +}}
+ {{/if}} +{{+ sub.excerpt +}}
+{{ meta.title | default("Untitled") }}
+ {{#if excerpt}} +{{+ excerpt +}}
+ {{/if}} +{{#if and(options.show_sub_results, sub_results)}} +{{#each sub_results as sub}} + +{{ sub.title | default("Section") }}
+ {{#if sub.excerpt}} +{{+ sub.excerpt +}}
+ {{/if}} + +{{/each}} +${i(30)}
+${i(40)}
+CCE tracks every query made through the MCP server and records how many tokens were served versus how many would have been needed without CCE. This data powers the cce savings command and the dashboard.
cce savingscce savingsExample output:
+ my-project · 42 queries
+ ⛁ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ 93% tokens saved
+ Without CCE 48.0k tokens $0.24 With CCE 3.4k tokens $0.02 ────────────────────────────────────────── Saved 44.6k tokens $0.22 ~81 tokens / query ~<$0.01 / query
+ How: retrieval 93% + compression 90% Cost estimate based on Opus input pricing ($5/1M tokens)Savings come from two independent stages:
+Retrieval savings (input). Instead of sending the entire codebase, CCE returns only the chunks relevant to the query. This is measured as: 1 - (served_tokens / full_codebase_tokens).
Compression savings (input). The retrieved chunks are further compressed (truncation, summarization) before being sent to the agent. This is measured as: 1 - (compressed_tokens / raw_chunk_tokens).
The combined effect is multiplicative. If retrieval cuts 90% and compression cuts another 50%, the total savings are 95%.
+The How: line in the output shows the contribution of each stage:
How: retrieval 93% + compression 90%Cost estimates use model-specific input pricing. Configure which model to estimate for:
+# ~/.cce/config.yaml or .context-engine.yamlpricing: model: opus # opus (default) | sonnet | haikuPrices are fetched from Anthropic’s documentation and cached for 7 days.
+cce dashboardcce dashboardThe dashboard opens in your browser and provides a visual view of:
+cce savings --allShows a combined report across every project you have indexed, useful for understanding total cost reduction.
+cce savings --jsonReturns machine-readable data for integration with other tools:
+{ "project": "my-project", "queries": 42, "served_tokens": 14200, "raw_tokens": 26000, "full_file_tokens": 48000, "tokens_saved": 33800, "savings_pct": 70, "retrieval_savings_pct": 46, "compression_savings_pct": 45}If you have zero queries recorded (fresh install), run a test search to seed the stats:
+cce search 'how does the main module work'This updates the savings tracker so cce status and the dashboard show non-zero values.