█████╗ ██████╗ ███████╗███╗ ██╗████████╗██╗ ██╗██╗████████╗
██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝██║ ██╔╝██║╚══██╔══╝
███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║ █████╔╝ ██║ ██║
██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║ ██╔═██╗ ██║ ██║
██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║ ██║ ██╗██║ ██║
╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝
Transparent ReAct loops · Multi-agent orchestration · Zero abstraction tax · Production-ready
pip install agentkit-aiQuickstart · Architecture · Multi-Agent · Tools · Memory · Docs
LangChain has 847 classes. AgentKit has one that matters: Agent.
AgentKit is a production-grade AI agent framework built on a single principle: you should always know exactly what your agent is doing, why, and what it cost. Every Thought, every Action, every tool call, every dollar — logged, stored, and accessible.
No hidden magic. No 5-layer abstractions. No framework source code archaeology when something breaks. Just clean Python, Pydantic schemas, and a transparent ReAct loop you can read in an afternoon.
"If you can't explain what your agent is doing step by step, you can't fix it when it breaks in production."
| What you want | AgentKit | LangChain |
|---|---|---|
| Debug a failing tool | Read your own function | Trace 5 abstraction layers |
| See what the LLM receives | agent.steps — every thought |
Add custom callbacks |
| Cost per run | response.estimated_usd |
Integrate a 3rd-party tool |
| Add a tool | @tool on any function |
Subclass BaseTool, override methods |
| Multi-agent setup | Team(manager=agent) |
Custom chains + callbacks |
| Switch LLM provider | One import swap | Rewrite your chains |
| Prevent dangerous actions | require_human_approval=True |
Build your own guardrails |
| Local / private models | OllamaLLM(model="llama3") |
Separate integration setup |
import asyncio
from agentkit.agent import Agent
from agentkit.llm.openai import OpenAILLM
from agentkit.tools import ToolRegistry, tool
@tool
def get_weather(city: str, unit: str = "C") -> str:
"""Returns current weather for a city.
Args:
city: The city name to check weather for.
unit: Temperature unit, either C or F.
"""
return f"It's 22{unit} and sunny in {city}."
async def main():
agent = Agent(
llm=OpenAILLM(model_name="gpt-4o"),
tools=ToolRegistry([get_weather]),
system_prompt="You are a helpful assistant.",
)
response = await agent.run("What's the weather like in Istanbul and Tokyo?")
print(response.final_answer)
# → It's 22C and sunny in both Istanbul and Tokyo!
# Full execution trace — every step the agent took
for step in response.steps:
print(f"[{step.type:12s}] {step.content}")
# → [thought ] I need to check weather for both cities.
# → [action ] get_weather({"city": "Istanbul", "unit": "C"})
# → [observation ] It's 22C and sunny in Istanbul.
# → [action ] get_weather({"city": "Tokyo", "unit": "C"})
# → [observation ] It's 22C and sunny in Tokyo.
# → [answer ] It's 22C and sunny in both cities!
print(f"Cost: ${response.estimated_usd:.6f}")
# → Cost: $0.000312
asyncio.run(main())No chains. No config files. No framework imports beyond Agent and @tool.
flowchart TB
subgraph Input
U["👤 User / Application"]
end
subgraph Core["AgentKit Core"]
direction TB
AGT["🤖 Agent\nagent.py"]
REACT["⚙️ ReAct Loop\nThought → Action → Observation"]
COST["💰 CostTracker\nestimated_usd"]
STEPS["📋 Step log\nsteps[]"]
AGT --> REACT
REACT --> COST
REACT --> STEPS
end
subgraph LLM["LLM Backends — BaseLLM"]
OAI["OpenAI\nGPT-4o · GPT-4 Turbo"]
ANT["Anthropic\nClaude 3.5 Sonnet · Opus"]
GRQ["Groq\nLlama 3 · Mixtral"]
OLL["Ollama\nLocal · Private · Free"]
end
subgraph Tools["Tool Layer"]
REG["ToolRegistry"]
DEC["@tool decorator\ntype hints → JSON schema"]
BLT["Built-ins\nweb_search · python_repl"]
INT["Integrations\ngithub · notion"]
CUSTOM["Your functions\nany Python callable"]
REG --> DEC
end
subgraph Memory["Memory"]
STM["ShortTermMemory\nsliding window · token-aware"]
LTM["LongTermMemory\nChromaDB · RAG · sentence-transformers"]
ENT["EntityMemory\nkey-value fact extraction"]
end
subgraph Orchestrator["Multi-Agent — Team"]
MGR["Manager Agent"]
R1["Researcher"]
R2["Coder"]
R3["Analyst"]
DEL["delegate_to_agent\nauto-generated tool"]
MGR -->|"delegates via"| DEL
DEL --> R1 & R2 & R3
end
U --> AGT
U --> Orchestrator
Orchestrator --> Core
AGT <--> LLM
AGT <--> Tools
AGT <--> Memory
flowchart TD
START(["agent.run(task)"])
INJECT["Inject tool schemas\ninto system prompt"]
STREAM["Stream LLM response\nasync generator"]
PARSE{"Parse response\n_parse_react_response()"}
THOUGHT["Log Thought\ncyan — why this action?"]
ACTION["Extract Action + Input\ntool_name + JSON args"]
HUMAN{"require_human\n_approval=True?"}
APPROVE{"User: y/n?"}
SKIP["Skip tool execution\nlog as Observation"]
VALIDATE{"JSON args\nvalid?"}
EXEC["Execute tool\nawait tool(**args)"]
OBS["Append Observation\nto conversation history"]
ERR["Append error as Observation\nagent self-corrects next iteration"]
DONE{"Action in\nresponse?"}
MAX{"max_iterations\nhit?"}
FINAL(["Return AgentResponse\nfinal_answer · steps · estimated_usd"])
START --> INJECT --> STREAM --> PARSE
PARSE --> THOUGHT --> ACTION
ACTION --> HUMAN
HUMAN -->|No| VALIDATE
HUMAN -->|Yes| APPROVE
APPROVE -->|y| VALIDATE
APPROVE -->|n| SKIP --> OBS
VALIDATE -->|Valid| EXEC --> OBS
VALIDATE -->|Invalid| ERR --> OBS
OBS --> DONE
DONE -->|Yes — loop| MAX
MAX -->|No| STREAM
MAX -->|Yes — stop| FINAL
DONE -->|No — done| FINAL
style THOUGHT fill:#0c2340,color:#93c5fd
style ACTION fill:#1f1200,color:#fbbf24
style OBS fill:#0a1f0e,color:#86efac
style ERR fill:#1f0808,color:#fca5a5
style FINAL fill:#130d2a,color:#c4b5fd
flowchart LR
FN["def search_db(query: str,\n table: str,\n limit: int = 10) -> list[dict]:\n '''Searches the database.\n Args:\n query: search term...\n table: table name...\n limit: max results...'''"]
INS["inspect.signature()\n+ get_type_hints()"]
PARS["Parse docstring\nArgs: → descriptions"]
PYD["Build Pydantic model\nper parameter + type"]
SCHEMA["JSON Schema\n{name, description,\n parameters: {...}}"]
REG["ToolRegistry.register()"]
LLM["Injected into LLM\nOpenAI · Anthropic · Groq · Ollama"]
FN --> INS --> PARS --> PYD --> SCHEMA --> REG --> LLM
sequenceDiagram
participant U as User
participant T as Team
participant M as Manager Agent
participant R as Researcher Agent
participant C as Coder Agent
U->>T: team.run("Find Python version, then print it with code")
T->>M: inject delegate_to_agent tool + run(task)
M->>M: Thought: I need current Python version → delegate research
M->>T: Action: delegate_to_agent(agent_name=researcher, task=...)
T->>R: researcher.run("Find latest Python version")
R->>R: Thought → web_search → Observation: "Python 3.13.1"
R-->>T: AgentResponse(final_answer="Python 3.13.1")
T-->>M: Observation: Researcher returned "Python 3.13.1"
M->>M: Thought: Now delegate code writing
M->>T: Action: delegate_to_agent(agent_name=coder, task=...)
T->>C: coder.run("Write code that prints Python 3.13.1")
C->>C: Thought → python_repl(code) → Observation: output
C-->>T: AgentResponse(final_answer="print('Python 3.13.1') → ran OK")
T-->>M: Observation: Coder result: ...
M->>M: Both subtasks done. Synthesise final answer.
M-->>U: AgentResponse\nfinal_answer + combined estimated_usd
Write a normal Python function. AgentKit generates the production-ready LLM schema automatically.
from agentkit.tools import tool
@tool
def search_database(query: str, table: str, limit: int = 10) -> list[dict]:
"""
Searches the database for records matching a query.
Args:
query: The search term to look for.
table: The database table to search in (e.g. 'users', 'orders').
limit: Maximum number of results to return. Defaults to 10.
"""
return db.search(query, table, limit)AgentKit reads your type hints and docstring, marks query and table as required, limit as optional with default 10, and generates a schema that works identically for OpenAI, Anthropic, Groq, and Ollama — zero changes when switching providers.
Built-in tools (agentkit/tools/builtins.py): web_search · python_repl · file_read · shell
Integration tools (agentkit/tools/integrations/): github_get_issue · github_create_pr · notion_create_page · notion_append_block
from agentkit.orchestrator import Team
# Specialist agents — each with a focused system prompt + tool set
researcher = Agent(llm=llm, tools=ToolRegistry([web_search]),
system_prompt="You find accurate information on the web.")
coder = Agent(llm=llm, tools=ToolRegistry([python_repl]),
system_prompt="You write clean, tested Python code.")
# Manager gets a `delegate_to_agent` tool injected automatically
manager = Agent(llm=llm, tools=ToolRegistry(),
system_prompt="You are a lead engineer. Break problems down and delegate.")
team = Team(manager=manager)
team.add_agent("researcher", researcher)
team.add_agent("coder", coder)
response = await team.run(
"Find the current EUR/USD rate and write a Python function that converts any EUR amount."
)
print(response.final_answer)
print(f"Total cost across all agents: ${response.estimated_usd:.4f}")When you call team.add_agent(name, agent), the Team class dynamically creates a delegate_to_agent(agent_name, task_description) tool and injects it into the Manager's ToolRegistry. The Manager never needs to know the sub-agents exist at instantiation time.
flowchart LR
subgraph ST["ShortTermMemory"]
direction TB
W["Sliding window\nmax_tokens budget"]
P["Auto-prune oldest\nmessages on overflow"]
end
subgraph LT["LongTermMemory"]
direction TB
E["sentence-transformers\nembeddings"]
DB["ChromaDB\nvector store"]
S["Semantic search\non retrieve()"]
end
subgraph EM["EntityMemory"]
direction TB
X["Extract structured facts\nnames · prefs · state"]
KV["Key-value store\nuser_name=Alice, lang=Python"]
end
A["Agent"] --> ST & LT & EM
from agentkit.memory import ShortTermMemory, LongTermMemory, EntityMemory
# Token-capped sliding window
Agent(..., memory=ShortTermMemory(max_tokens=4000))
# RAG across sessions — recall any past context by semantic similarity
Agent(..., memory=LongTermMemory(persist_dir="./agent_memory"))
# Extract and persist structured facts from conversation
Agent(..., memory=EntityMemory())Every run returns exact cost data — no estimation, no guessing.
response = await agent.run("Summarize this 50-page report.")
print(f"Input tokens: {response.token_usage.input}")
print(f"Output tokens: {response.token_usage.output}")
print(f"Estimated USD: ${response.estimated_usd:.6f}")
# Bring your own pricing (per million tokens)
llm = OpenAILLM(
model_name="gpt-4o",
price_per_m_input=2.50,
price_per_m_output=10.00,
)Cost is computed from tiktoken + provider-reported usage, accurate even on streamed responses.
agent = Agent(
llm=llm,
tools=ToolRegistry([execute_sql, send_email, delete_file]),
require_human_approval=True,
approval_tools=["execute_sql", "delete_file"], # only gate these
)Before any gated tool runs, the loop pauses:
┌─────────────────────────────────────────────────────────────────┐
│ ⚠ Agent wants to execute a tool │
│ │
│ Tool: execute_sql │
│ Input: {"query": "DELETE FROM users WHERE inactive = true"} │
│ │
│ Approve? [y/n]: │
└─────────────────────────────────────────────────────────────────┘
n → logs the skip as an Observation, agent continues reasoning. Never crashes.
One interface. Four providers. One import to switch.
from agentkit.llm.openai import OpenAILLM
from agentkit.llm.anthropic import AnthropicLLM
from agentkit.llm.groq import GroqLLM
from agentkit.llm.ollama import OllamaLLM
llm = OpenAILLM(model_name="gpt-4o")
llm = AnthropicLLM(model_name="claude-3-5-sonnet-20241022")
llm = GroqLLM(model_name="llama-3.1-70b-versatile") # ultra-low latency
llm = OllamaLLM(model_name="llama3.2") # local, free, privateAll implement BaseLLM with async streaming. Your tools, memory, and Team are provider-agnostic.
agentkit/
│
├── agent.py ← Agent · AgentStep · CostTracker · ReAct loop
├── orchestrator.py ← Team · Manager↔SubAgent · delegate_to_agent injection
├── cli.py ← Rich terminal UI · run agents from command line
├── __main__.py ← python -m agentkit entry point
│
├── llm/
│ ├── base.py ← BaseLLM · LLMChunk · abstract async streaming
│ ├── openai.py ← OpenAI async/streaming + tiktoken cost
│ ├── anthropic.py ← Anthropic Claude + usage-header cost
│ ├── groq.py ← Groq (Llama 3, Mixtral) low-latency
│ └── ollama.py ← Local Ollama — zero API cost
│
├── memory/
│ ├── short_term.py ← Sliding window · token budget · auto-prune
│ ├── long_term.py ← ChromaDB + sentence-transformers · RAG retrieve
│ └── entity.py ← Extract + persist structured facts from conversation
│
├── tools/
│ ├── base.py ← ToolRegistry · ToolDefinition · register API
│ ├── decorator.py ← @tool · type hints + docstring → JSON schema
│ ├── builtins.py ← web_search · python_repl · file_read · shell
│ └── integrations/
│ ├── github.py ← get_issue · create_pr · list_prs (PyGithub)
│ └── notion.py ← create_page · append_block · query_database
│
├── types/
│ └── schemas.py ← Message · AgentStep · AgentResponse · TokenUsage (Pydantic v2)
│
└── utils/
└── logging.py ← Loguru · Thought=cyan · Action=yellow · Observation=green
# Core — all LLM providers + built-in tools
pip install agentkit-ai
# With long-term vector memory (ChromaDB + sentence-transformers)
pip install agentkit-ai[memory]
# With GitHub + Notion integrations
pip install agentkit-ai[integrations]
# Everything
pip install agentkit-ai[all]Python 3.10+ required.
git clone https://github.com/agentkit/agentkit.git
cd agentkit
poetry install --all-extras
poetry run pre-commit install
poetry run pytest --cov=agentkit tests/| Example | What it demonstrates |
|---|---|
examples/quickstart.py |
Single agent · @tool · cost tracking |
examples/multi_agent_team.py |
Manager + researcher + coder |
examples/long_term_memory.py |
ChromaDB RAG across sessions |
examples/human_in_loop.py |
Approval gates for destructive tools |
examples/local_llm_ollama.py |
Fully local setup, no API key |
examples/github_agent.py |
Agent that reads and triages GitHub issues |
examples/cost_benchmarks.py |
Provider cost comparison for same task |
- Transparent ReAct loop with full step logging
-
@tool— type hints + docstring → JSON schema, all providers -
Teamwith automaticdelegate_to_agentinjection - Short-term · Long-term (RAG) · Entity memory
- Built-in cost tracking per run
- Human-in-the-loop approval gates
- OpenAI · Anthropic · Groq · Ollama
- GitHub + Notion integrations
- Rich CLI (
python -m agentkit) - Parallel tool fan-out (concurrent tools in one ReAct step)
- Agent checkpointing — resume long-running tasks after interruption
- Step-by-step execution web UI
- LangSmith / Langfuse observability integration
-
agentkit deploy— one-command agent API server
Issues and PRs are welcome. For large changes, open an issue first.
git clone https://github.com/agentkit/agentkit.git && cd agentkit
poetry install --all-extras && poetry run pre-commit install
poetry run pytest --cov=agentkit tests/ # run tests
poetry run ruff check agentkit/ # lint
poetry run mypy agentkit/ # type-checkMIT © AgentKit Contributors
Built out of genuine frustration with opaque agent frameworks.
If AgentKit saved you hours of debugging, a ⭐ means the world.