Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -131,14 +131,20 @@ You will see examples in the wild that build the model by hand with `init_chat_m

## Tier 2 — LangGraph (agent workflows)

LangGraph's `create_react_agent` takes a `model`, `tools`, and `prompt`. Build the model the same way as the single-LangChain case — `create_langchain_model` — and pass it in. The tracker wraps the whole agent invocation, and the extractor aggregates token usage across every message the agent produced.
LangGraph's prebuilt agent takes a model, tools, and a system prompt. Build the model with `create_langchain_model` (Python) or `LangChainProvider.createLangChainModel` (Node) and pass it in. The tracker wraps the whole agent invocation; the extractor aggregates token usage across every message the agent produced, and tool-call telemetry is read off the result after the wrapped call returns.

**Python** — agent mode with a `MemorySaver` checkpointer:
> **API note (Python).** Use `from langchain.agents import create_agent`. The earlier `from langgraph.prebuilt import create_react_agent` is deprecated in LangGraph 1.0 and removed in 2.0 — same return shape; the only call-site rename is `prompt=` → `system_prompt=`. Node still uses `createReactAgent` from `@langchain/langgraph/prebuilt`.

**Python** — agent mode with a `MemorySaver` checkpointer. The Python helper package ships `sum_token_usage_from_messages` (token aggregation across the agent's output messages) and `get_tool_calls_from_response` (tool-call name extraction per message); use them inside the `track_metrics_of_async` extractor / loop instead of hand-rolling either:

```python
from ldai.tracker import TokenUsage
from ldai_langchain import create_langchain_model, get_ai_metrics_from_response
from langgraph.prebuilt import create_react_agent
from ldai.providers.types import LDAIMetrics
from ldai_langchain import (
create_langchain_model,
get_tool_calls_from_response,
sum_token_usage_from_messages,
)
from langchain.agents import create_agent
from langgraph.checkpoint.memory import MemorySaver

agent_config = ai_client.agent_config("my-agent-key", context)
Expand All @@ -149,40 +155,34 @@ llm = create_langchain_model(agent_config)

# MemorySaver gives the ReAct agent short-term memory per thread_id.
checkpointer = MemorySaver()
agent = create_react_agent(
agent = create_agent(
llm,
tools=[...], # application-owned tool handlers
prompt=agent_config.instructions,
[...], # application-owned tool handlers
system_prompt=agent_config.instructions,
checkpointer=checkpointer,
)

async def track_langgraph_metrics(tracker, func):
"""Aggregate token usage across every message the agent produced.
wraps track_duration_of + manual success/tokens/error tracking."""
try:
result = await tracker.track_duration_of(func)
tracker.track_success()
total_in = total_out = total = 0
for message in result.get("messages", []):
metrics = get_ai_metrics_from_response(message)
if metrics.usage:
total_in += metrics.usage.input
total_out += metrics.usage.output
total += metrics.usage.total
if total > 0:
tracker.track_tokens(TokenUsage(input=total_in, output=total_out, total=total))
return result
except Exception:
tracker.track_error()
raise

result = await track_langgraph_metrics(
agent_config.create_tracker(),
lambda: agent.ainvoke(
{"messages": [{"role": "user", "content": user_prompt}]},
config={"configurable": {"thread_id": thread_id}},
),
)
# track_metrics_of_async records duration + success/error itself; the
# extractor only returns LDAIMetrics. The surrounding try/except is for
# local logging, not for tracker bookkeeping.
tracker = agent_config.create_tracker()
try:
result = await tracker.track_metrics_of_async(
lambda: agent.ainvoke(
{"messages": [{"role": "user", "content": user_prompt}]},
config={"configurable": {"thread_id": thread_id}},
),
lambda res: LDAIMetrics(
success=True,
usage=sum_token_usage_from_messages(res.get("messages", [])),
),
)
for msg in result.get("messages", []):
for name in get_tool_calls_from_response(msg):
tracker.track_tool_call(name)
except Exception as e:
# Already recorded by track_metrics_of_async — log locally if needed.
raise
```

**Node** — same pattern with `trackMetricsOf` + a custom aggregator:
Expand Down Expand Up @@ -219,6 +219,8 @@ const langgraphMetrics = (result: any): LDAIMetrics => {
return { success: true, usage: total > 0 ? { input, output, total } : undefined };
};

// trackMetricsOf records duration + success/error itself; do not call
// trackError after this — it would be a redundant second event.
const agentTracker = agentConfig.createTracker!();
const result = await agentTracker.trackMetricsOf(
langgraphMetrics,
Expand All @@ -227,6 +229,14 @@ const result = await agentTracker.trackMetricsOf(
{ configurable: { thread_id: threadId } },
),
);

// Tool-call telemetry: walk the result messages. Once the JS SDK ships
// `LangChainProvider.getToolCallsFromResponse`, this collapses to one helper call.
for (const msg of result.messages ?? []) {
for (const tc of (msg as any).tool_calls ?? []) {
agentTracker.trackToolCall(tc.name);
}
}
```

### Why aggregate per message
Expand Down
36 changes: 25 additions & 11 deletions skills/ai-configs/aiconfig-migrate/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The skill is optimized for Python and Node.js / TypeScript; other languages are
| One-shot completion (direct OpenAI / Anthropic / Bedrock / Gemini call) | ✅ Worked example | ✅ Worked example | [before-after-examples.md](references/before-after-examples.md), per-provider docs in `aiconfig-ai-metrics/references/` |
| Chat loop via managed runner (`ManagedModel` / `TrackedChat`) | ✅ Tier 1 pattern | ✅ Tier 1 pattern | [aiconfig-ai-metrics SKILL.md](../aiconfig-ai-metrics/SKILL.md) |
| LangChain single-call | ✅ Worked example | ✅ Worked example | [langchain-tracking.md](../aiconfig-ai-metrics/references/langchain-tracking.md) |
| LangGraph `create_react_agent` / `createReactAgent` (prebuilt) | ✅ Worked example | ✅ Worked example | [agent-mode-frameworks.md § LangGraph](references/agent-mode-frameworks.md) |
| LangGraph prebuilt agent (Python `langchain.agents.create_agent`, Node `createReactAgent`) | ✅ Worked example | ✅ Worked example | [agent-mode-frameworks.md § LangGraph](references/agent-mode-frameworks.md) |
| LangGraph custom `StateGraph` with run-scoped tracker (setup_run + call_model + finalize) | ✅ Deep worked example | ⚠️ Mentioned — translate from Python | [agent-mode-frameworks.md § Custom `StateGraph`](references/agent-mode-frameworks.md) |
| CrewAI `Agent` | ✅ Worked example | — (not a Node framework) | [agent-mode-frameworks.md § CrewAI](references/agent-mode-frameworks.md) |
| Strands `Agent` | ✅ Worked example | ⚠️ BedrockModel + OpenAIModel only (no Anthropic) | [agent-mode-frameworks.md § Strands](references/agent-mode-frameworks.md) |
Expand Down Expand Up @@ -93,8 +93,9 @@ Use [phase-1-analysis-checklist.md](references/phase-1-analysis-checklist.md) to
3. **Existing LaunchDarkly usage** — any pre-existing `LDClient` or `ldclient` initialization to reuse
4. **Hardcoded model configs** — model name string literals, temperature / max_tokens / top_p, system prompts, instruction strings
5. **Template placeholders in prompts** — `.format()` calls, f-strings in prompt constants, JS/TS template literals, `%(var)s`, hand-rolled `str.replace("__VAR__", ...)`. Flag each placeholder name and its runtime-value source; all get rewritten to Mustache `{{ variable }}` in Stage 2.
6. **Hardcoded app-scoped knobs** — search-result limits, retry budgets, tool-timeout overrides, feature toggles, any config-dataclass field that isn't a prompt or model parameter but still governs agent behavior. These belong in `model.custom` on the variation (not `model.parameters`, which is forwarded to the provider SDK and will crash on unknown kwargs).
7. **Mode decision** — completion mode (chat messages array) or agent mode (single instructions string). Completion mode is the default and the only mode that supports judges attached in the UI.
6. **Externalized prompt files** — scan YAML / JSON / TOML / Markdown / `.prompt` / `.j2` files **and** prompt-template registries (`langchain.hub.pull(...)`, LangSmith `client.pull_prompt(...)`) for prompts loaded at runtime. Common shapes: CrewAI `agents.yaml` / `tasks.yaml`, LangChain Promptfiles, k8s ConfigMap overlays, Pydantic Settings classes with `prompt_*` fields. Same Mustache rewrite (sub-step 5 of Stage 2) applies if the placeholder syntax differs. See [phase-1-analysis-checklist.md § 4](references/phase-1-analysis-checklist.md).
7. **Hardcoded app-scoped knobs** — search-result limits, retry budgets, tool-timeout overrides, feature toggles, any config-dataclass field that isn't a prompt or model parameter but still governs agent behavior. These belong in `model.custom` on the variation (not `model.parameters`, which is forwarded to the provider SDK and will crash on unknown kwargs).
8. **Mode decision** — completion mode (chat messages array) or agent mode (single instructions string). Completion mode is the default and the only mode that supports judges attached in the UI.

For each hardcoded target the audit finds, record:

Expand All @@ -118,10 +119,20 @@ Hardcoded targets:
- src/chat.py:42 model="gpt-4o"
- src/chat.py:43 temperature=0.7, max_tokens=2000
- src/chat.py:45 system="You are a helpful assistant..."
Externalized prompt files: none (or e.g. "prompts/agents.yaml — CrewAI role/goal/backstory")
Prompt-template registries: none (or e.g. langchain.hub.pull("rlm/rag-prompt") at app.py:14)
Coverage totals: 3 hardcoded code targets · 0 externalized prompt files · 0 registry pulls
Proposed plan: single AI Config key `chat-assistant`, mirror fallback, Stage 3 (tools) skipped (no function calling), Stage 4 (tracking) inline, Stage 5 (evals) attach built-in accuracy judge.
```

**STOP.** Present this summary and wait for the user to confirm before proceeding to Stage 2. **This is the most important checkpoint in the workflow** — if the audit is wrong, every stage after this will be wrong. The user should cross-check the hardcoded-targets list against what they know is in the code before giving the go-ahead.
**STOP.** Present this summary, state the coverage totals out loud (e.g. "I found **N** hardcoded code targets and **M** externalized prompt files — does that match what you expected?"), and wait for the user to reply with one of four explicit forms:

- **`confirm`** — proceed to Stage 2.
- **`add: <files or paths>`** — re-run the audit with the new locations and present an updated summary.
- **`fix: <correction>`** — update a target in the list (provider, mode, prompt content, etc.) and ask again.
- **`stop`** — pause the migration here.

Do not interpret any other word — including `skip`, `next`, `go`, `ok`, `proceed` — as confirmation; ask the user to pick one of the four forms. **This is the most important checkpoint in the workflow** — if the audit is wrong, every stage after this will be wrong. The user should cross-check the hardcoded-targets list against what they know is in the code before giving the go-ahead.

### Step 2: Wrap the call in the AI SDK (Stage 2)

Expand Down Expand Up @@ -284,11 +295,13 @@ This is the first stage that writes code. It has nine sub-steps.
params = config.model.parameters or {}

# Pass model_name + instructions into your framework's agent constructor.
# Example: LangGraph create_react_agent
# agent = create_react_agent(
# model=load_chat_model(model_name),
# tools=TOOLS, # Stage 3 will replace this with a config.tools loader
# prompt=instructions,
# Example: LangGraph prebuilt agent (Python — `from langchain.agents import create_agent`;
# this replaces `langgraph.prebuilt.create_react_agent`, deprecated in LangGraph 1.0
# and removed in 2.0. Same return shape; `prompt=` was renamed to `system_prompt=`.)
# agent = create_agent(
# create_langchain_model(config), # forwards every variation parameter
# TOOLS, # Stage 3 will replace this with a config.tools loader
# system_prompt=instructions,
# )
```

Expand All @@ -308,7 +321,8 @@ Skip this step if the audited app has no function calling / tools. Otherwise:

- `openai.chat.completions.create(tools=[...])` — OpenAI direct
- `anthropic.messages.create(tools=[...])` — Anthropic direct
- `create_react_agent(tools=[...])` — LangGraph prebuilt ReAct
- `create_agent(llm, tools=[...], system_prompt=...)` — LangGraph prebuilt (Python, `langchain.agents`; replaces deprecated `langgraph.prebuilt.create_react_agent`)
- `createReactAgent({ llm, tools: [...] })` — LangGraph.js prebuilt (Node, `@langchain/langgraph/prebuilt`)
- `Agent(tools=[...])` — CrewAI
- `Agent(tools=[...])` — Strands (Python `@tool`-decorated callables passed through the constructor; TS SDK uses Zod-schema tools)
- **Custom `StateGraph`** — module-level `TOOLS = [...]` list referenced in **both** `model.bind_tools(TOOLS)` and `ToolNode(TOOLS)`. This is the `langchain-ai/react-agent` template shape; the list is usually in a `tools.py` module. Grep for `bind_tools(` and `ToolNode(` together — they will point at the same list.
Expand Down Expand Up @@ -479,7 +493,7 @@ Delegate: **`aiconfig-online-evals`** (sub-step 3, optional — only for UI-atta
| App uses LangChain `ChatOpenAI(model=...)` | Replace the hand-rolled model construction with `create_langchain_model(config)` (Python) or `LangChainProvider.createLangChainModel(config)` (Node). Do not read `config.model.name` and pass it to `ChatOpenAI(model=...)` by hand — that pattern drops every variation parameter except the ones you explicitly name |
| Retry wrapper around the provider call | The tracker is minted once at the top of the user turn; the retry loop is inside that scope. Every retry attempt shares the same `runId`. Tracker calls (`track_duration` / `track_tokens` / `track_success` / `track_error`) live *outside* the retry body — one call at the end of the turn, on the success path or the final-failure path |
| App has no tools — Stage 3 skipped | Move directly from Stage 2 verification to Stage 4 (tracking) |
| Mode mismatch: user said agent, audit shows one-shot chat | Choose completion mode unless the app uses LangGraph `create_react_agent`, CrewAI `Agent`, Strands `Agent`, or a similar goal-driven framework |
| Mode mismatch: user said agent, audit shows one-shot chat | Choose completion mode unless the app uses a LangGraph prebuilt agent (`langchain.agents.create_agent` in Python or `createReactAgent` in Node), CrewAI `Agent`, Strands `Agent`, or a similar goal-driven framework |
| App uses Strands Agents (Python) | Agent mode. Build a `create_strands_model` dispatcher keyed on `agent_config.provider.name` that returns `AnthropicModel(model_id=..., max_tokens=...)` or `OpenAIModel(model_id=..., params=...)`. Drop `parameters.tools` before passing params to the model class — Strands receives tools via `Agent(tools=[...])`. Tracking is Tier 3: wrap `invoke_async` with `tracker.track_duration_of(...)` and record tokens from `result.metrics.accumulated_usage`. See [agent-mode-frameworks.md § Strands Agent](references/agent-mode-frameworks.md) and [strands-tracking.md](../aiconfig-ai-metrics/references/strands-tracking.md) |
| Strands app on TypeScript | TS SDK ships `BedrockModel` and `OpenAIModel` only — cannot serve Anthropic-backed variations. Use the Python SDK if multi-provider variations are required |
| TypeScript app using Anthropic SDK | No `trackAnthropicMetrics` helper exists. Use Tier 3: `trackMetricsOf` with a small custom extractor that reads `response.usage.input_tokens` / `response.usage.output_tokens` and returns `LDAIMetrics`. See [anthropic-tracking.md](../aiconfig-ai-metrics/references/anthropic-tracking.md) in the `aiconfig-ai-metrics` skill for the exact extractor |
Expand Down
Loading
Loading