launchdarkly · sattensil · May 6, 2026 · May 1, 2026 · May 6, 2026
@@ -131,14 +131,20 @@ You will see examples in the wild that build the model by hand with `init_chat_m
 
 ## Tier 2 — LangGraph (agent workflows)
 
-LangGraph's `create_react_agent` takes a `model`, `tools`, and `prompt`. Build the model the same way as the single-LangChain case — `create_langchain_model` — and pass it in. The tracker wraps the whole agent invocation, and the extractor aggregates token usage across every message the agent produced.
+LangGraph's prebuilt agent takes a model, tools, and a system prompt. Build the model with `create_langchain_model` (Python) or `LangChainProvider.createLangChainModel` (Node) and pass it in. The tracker wraps the whole agent invocation; the extractor aggregates token usage across every message the agent produced, and tool-call telemetry is read off the result after the wrapped call returns.
 
-**Python** — agent mode with a `MemorySaver` checkpointer:
+> **API note (Python).** Use `from langchain.agents import create_agent`. The earlier `from langgraph.prebuilt import create_react_agent` is deprecated in LangGraph 1.0 and removed in 2.0 — same return shape; the only call-site rename is `prompt=` → `system_prompt=`. Node still uses `createReactAgent` from `@langchain/langgraph/prebuilt`.
+
+**Python** — agent mode with a `MemorySaver` checkpointer. The Python helper package ships `sum_token_usage_from_messages` (token aggregation across the agent's output messages) and `get_tool_calls_from_response` (tool-call name extraction per message); use them inside the `track_metrics_of_async` extractor / loop instead of hand-rolling either:
 
 ```python
-from ldai.tracker import TokenUsage
-from ldai_langchain import create_langchain_model, get_ai_metrics_from_response
-from langgraph.prebuilt import create_react_agent
+from ldai.providers.types import LDAIMetrics
+from ldai_langchain import (
+    create_langchain_model,
+    get_tool_calls_from_response,
+    sum_token_usage_from_messages,
+)
+from langchain.agents import create_agent
 from langgraph.checkpoint.memory import MemorySaver
 
 agent_config = ai_client.agent_config("my-agent-key", context)
@@ -149,40 +155,34 @@ llm = create_langchain_model(agent_config)
 
 # MemorySaver gives the ReAct agent short-term memory per thread_id.
 checkpointer = MemorySaver()
-agent = create_react_agent(
+agent = create_agent(
     llm,
-    tools=[...],                     # application-owned tool handlers
-    prompt=agent_config.instructions,
+    [...],                                # application-owned tool handlers
+    system_prompt=agent_config.instructions,
     checkpointer=checkpointer,
 )
 
-async def track_langgraph_metrics(tracker, func):
-    """Aggregate token usage across every message the agent produced.
-    wraps track_duration_of + manual success/tokens/error tracking."""
-    try:
-        result = await tracker.track_duration_of(func)
-        tracker.track_success()
-        total_in = total_out = total = 0
-        for message in result.get("messages", []):
-            metrics = get_ai_metrics_from_response(message)
-            if metrics.usage:
-                total_in += metrics.usage.input
-                total_out += metrics.usage.output
-                total += metrics.usage.total
-        if total > 0:
-            tracker.track_tokens(TokenUsage(input=total_in, output=total_out, total=total))
-        return result
-    except Exception:
-        tracker.track_error()
-        raise
-
-result = await track_langgraph_metrics(
-    agent_config.create_tracker(),
-    lambda: agent.ainvoke(
-        {"messages": [{"role": "user", "content": user_prompt}]},
-        config={"configurable": {"thread_id": thread_id}},
-    ),
-)
+# track_metrics_of_async records duration + success/error itself; the
+# extractor only returns LDAIMetrics. The surrounding try/except is for
+# local logging, not for tracker bookkeeping.
+tracker = agent_config.create_tracker()
+try:
+    result = await tracker.track_metrics_of_async(
+        lambda: agent.ainvoke(
+            {"messages": [{"role": "user", "content": user_prompt}]},
+            config={"configurable": {"thread_id": thread_id}},
+        ),
+        lambda res: LDAIMetrics(
+            success=True,
+            usage=sum_token_usage_from_messages(res.get("messages", [])),
+        ),
+    )
+    for msg in result.get("messages", []):
+        for name in get_tool_calls_from_response(msg):
+            tracker.track_tool_call(name)
+except Exception as e:
+    # Already recorded by track_metrics_of_async — log locally if needed.
+    raise
 ```
 
 **Node** — same pattern with `trackMetricsOf` + a custom aggregator:
@@ -219,6 +219,8 @@ const langgraphMetrics = (result: any): LDAIMetrics => {
   return { success: true, usage: total > 0 ? { input, output, total } : undefined };
 };
 
+// trackMetricsOf records duration + success/error itself; do not call
+// trackError after this — it would be a redundant second event.
 const agentTracker = agentConfig.createTracker!();
 const result = await agentTracker.trackMetricsOf(
   langgraphMetrics,
@@ -227,6 +229,14 @@ const result = await agentTracker.trackMetricsOf(
     { configurable: { thread_id: threadId } },
   ),
 );
+
+// Tool-call telemetry: walk the result messages. Once the JS SDK ships
+// `LangChainProvider.getToolCallsFromResponse`, this collapses to one helper call.
+for (const msg of result.messages ?? []) {
+  for (const tc of (msg as any).tool_calls ?? []) {
+    agentTracker.trackToolCall(tc.name);
+  }
+}
 ```
 
 ### Why aggregate per message

@@ -33,7 +33,7 @@ The skill is optimized for Python and Node.js / TypeScript; other languages are
 | One-shot completion (direct OpenAI / Anthropic / Bedrock / Gemini call) | ✅ Worked example | ✅ Worked example | [before-after-examples.md](references/before-after-examples.md), per-provider docs in `aiconfig-ai-metrics/references/` |
 | Chat loop via managed runner (`ManagedModel` / `TrackedChat`) | ✅ Tier 1 pattern | ✅ Tier 1 pattern | [aiconfig-ai-metrics SKILL.md](../aiconfig-ai-metrics/SKILL.md) |
 | LangChain single-call | ✅ Worked example | ✅ Worked example | [langchain-tracking.md](../aiconfig-ai-metrics/references/langchain-tracking.md) |
-| LangGraph `create_react_agent` / `createReactAgent` (prebuilt) | ✅ Worked example | ✅ Worked example | [agent-mode-frameworks.md § LangGraph](references/agent-mode-frameworks.md) |
+| LangGraph prebuilt agent (Python `langchain.agents.create_agent`, Node `createReactAgent`) | ✅ Worked example | ✅ Worked example | [agent-mode-frameworks.md § LangGraph](references/agent-mode-frameworks.md) |
 | LangGraph custom `StateGraph` with run-scoped tracker (setup_run + call_model + finalize) | ✅ Deep worked example | ⚠️ Mentioned — translate from Python | [agent-mode-frameworks.md § Custom `StateGraph`](references/agent-mode-frameworks.md) |
 | CrewAI `Agent` | ✅ Worked example | — (not a Node framework) | [agent-mode-frameworks.md § CrewAI](references/agent-mode-frameworks.md) |
 | Strands `Agent` | ✅ Worked example | ⚠️ BedrockModel + OpenAIModel only (no Anthropic) | [agent-mode-frameworks.md § Strands](references/agent-mode-frameworks.md) |
@@ -93,8 +93,9 @@ Use [phase-1-analysis-checklist.md](references/phase-1-analysis-checklist.md) to
 3. **Existing LaunchDarkly usage** — any pre-existing `LDClient` or `ldclient` initialization to reuse
 4. **Hardcoded model configs** — model name string literals, temperature / max_tokens / top_p, system prompts, instruction strings
 5. **Template placeholders in prompts** — `.format()` calls, f-strings in prompt constants, JS/TS template literals, `%(var)s`, hand-rolled `str.replace("__VAR__", ...)`. Flag each placeholder name and its runtime-value source; all get rewritten to Mustache `{{ variable }}` in Stage 2.
-6. **Hardcoded app-scoped knobs** — search-result limits, retry budgets, tool-timeout overrides, feature toggles, any config-dataclass field that isn't a prompt or model parameter but still governs agent behavior. These belong in `model.custom` on the variation (not `model.parameters`, which is forwarded to the provider SDK and will crash on unknown kwargs).
-7. **Mode decision** — completion mode (chat messages array) or agent mode (single instructions string). Completion mode is the default and the only mode that supports judges attached in the UI.
+6. **Externalized prompt files** — scan YAML / JSON / TOML / Markdown / `.prompt` / `.j2` files **and** prompt-template registries (`langchain.hub.pull(...)`, LangSmith `client.pull_prompt(...)`) for prompts loaded at runtime. Common shapes: CrewAI `agents.yaml` / `tasks.yaml`, LangChain Promptfiles, k8s ConfigMap overlays, Pydantic Settings classes with `prompt_*` fields. Same Mustache rewrite (sub-step 5 of Stage 2) applies if the placeholder syntax differs. See [phase-1-analysis-checklist.md § 4](references/phase-1-analysis-checklist.md).
+7. **Hardcoded app-scoped knobs** — search-result limits, retry budgets, tool-timeout overrides, feature toggles, any config-dataclass field that isn't a prompt or model parameter but still governs agent behavior. These belong in `model.custom` on the variation (not `model.parameters`, which is forwarded to the provider SDK and will crash on unknown kwargs).
+8. **Mode decision** — completion mode (chat messages array) or agent mode (single instructions string). Completion mode is the default and the only mode that supports judges attached in the UI.
 
 For each hardcoded target the audit finds, record:
 
@@ -118,10 +119,20 @@ Hardcoded targets:
   - src/chat.py:42   model="gpt-4o"
   - src/chat.py:43   temperature=0.7, max_tokens=2000
   - src/chat.py:45   system="You are a helpful assistant..."
+Externalized prompt files: none (or e.g. "prompts/agents.yaml — CrewAI role/goal/backstory")
+Prompt-template registries: none (or e.g. langchain.hub.pull("rlm/rag-prompt") at app.py:14)
+Coverage totals: 3 hardcoded code targets · 0 externalized prompt files · 0 registry pulls
 Proposed plan: single AI Config key `chat-assistant`, mirror fallback, Stage 3 (tools) skipped (no function calling), Stage 4 (tracking) inline, Stage 5 (evals) attach built-in accuracy judge.
 ```
 
-**STOP.** Present this summary and wait for the user to confirm before proceeding to Stage 2. **This is the most important checkpoint in the workflow** — if the audit is wrong, every stage after this will be wrong. The user should cross-check the hardcoded-targets list against what they know is in the code before giving the go-ahead.
+**STOP.** Present this summary, state the coverage totals out loud (e.g. "I found **N** hardcoded code targets and **M** externalized prompt files — does that match what you expected?"), and wait for the user to reply with one of four explicit forms:
+
+- **`confirm`** — proceed to Stage 2.
+- **`add: <files or paths>`** — re-run the audit with the new locations and present an updated summary.
+- **`fix: <correction>`** — update a target in the list (provider, mode, prompt content, etc.) and ask again.
+- **`stop`** — pause the migration here.
+
+Do not interpret any other word — including `skip`, `next`, `go`, `ok`, `proceed` — as confirmation; ask the user to pick one of the four forms. **This is the most important checkpoint in the workflow** — if the audit is wrong, every stage after this will be wrong. The user should cross-check the hardcoded-targets list against what they know is in the code before giving the go-ahead.
 
 ### Step 2: Wrap the call in the AI SDK (Stage 2)
 
@@ -284,11 +295,13 @@ This is the first stage that writes code. It has nine sub-steps.
    params = config.model.parameters or {}
 
    # Pass model_name + instructions into your framework's agent constructor.
-   # Example: LangGraph create_react_agent
-   # agent = create_react_agent(
-   #     model=load_chat_model(model_name),
-   #     tools=TOOLS,               # Stage 3 will replace this with a config.tools loader
-   #     prompt=instructions,
+   # Example: LangGraph prebuilt agent (Python — `from langchain.agents import create_agent`;
+   # this replaces `langgraph.prebuilt.create_react_agent`, deprecated in LangGraph 1.0
+   # and removed in 2.0. Same return shape; `prompt=` was renamed to `system_prompt=`.)
+   # agent = create_agent(
+   #     create_langchain_model(config),  # forwards every variation parameter
+   #     TOOLS,                            # Stage 3 will replace this with a config.tools loader
+   #     system_prompt=instructions,
    # )
    ```
 
@@ -308,7 +321,8 @@ Skip this step if the audited app has no function calling / tools. Otherwise:
 
    - `openai.chat.completions.create(tools=[...])` — OpenAI direct
    - `anthropic.messages.create(tools=[...])` — Anthropic direct
-   - `create_react_agent(tools=[...])` — LangGraph prebuilt ReAct
+   - `create_agent(llm, tools=[...], system_prompt=...)` — LangGraph prebuilt (Python, `langchain.agents`; replaces deprecated `langgraph.prebuilt.create_react_agent`)
+   - `createReactAgent({ llm, tools: [...] })` — LangGraph.js prebuilt (Node, `@langchain/langgraph/prebuilt`)
    - `Agent(tools=[...])` — CrewAI
    - `Agent(tools=[...])` — Strands (Python `@tool`-decorated callables passed through the constructor; TS SDK uses Zod-schema tools)
    - **Custom `StateGraph`** — module-level `TOOLS = [...]` list referenced in **both** `model.bind_tools(TOOLS)` and `ToolNode(TOOLS)`. This is the `langchain-ai/react-agent` template shape; the list is usually in a `tools.py` module. Grep for `bind_tools(` and `ToolNode(` together — they will point at the same list.
@@ -479,7 +493,7 @@ Delegate: **`aiconfig-online-evals`** (sub-step 3, optional — only for UI-atta
 | App uses LangChain `ChatOpenAI(model=...)` | Replace the hand-rolled model construction with `create_langchain_model(config)` (Python) or `LangChainProvider.createLangChainModel(config)` (Node). Do not read `config.model.name` and pass it to `ChatOpenAI(model=...)` by hand — that pattern drops every variation parameter except the ones you explicitly name |
 | Retry wrapper around the provider call | The tracker is minted once at the top of the user turn; the retry loop is inside that scope. Every retry attempt shares the same `runId`. Tracker calls (`track_duration` / `track_tokens` / `track_success` / `track_error`) live *outside* the retry body — one call at the end of the turn, on the success path or the final-failure path |
 | App has no tools — Stage 3 skipped | Move directly from Stage 2 verification to Stage 4 (tracking) |
-| Mode mismatch: user said agent, audit shows one-shot chat | Choose completion mode unless the app uses LangGraph `create_react_agent`, CrewAI `Agent`, Strands `Agent`, or a similar goal-driven framework |
+| Mode mismatch: user said agent, audit shows one-shot chat | Choose completion mode unless the app uses a LangGraph prebuilt agent (`langchain.agents.create_agent` in Python or `createReactAgent` in Node), CrewAI `Agent`, Strands `Agent`, or a similar goal-driven framework |
 | App uses Strands Agents (Python) | Agent mode. Build a `create_strands_model` dispatcher keyed on `agent_config.provider.name` that returns `AnthropicModel(model_id=..., max_tokens=...)` or `OpenAIModel(model_id=..., params=...)`. Drop `parameters.tools` before passing params to the model class — Strands receives tools via `Agent(tools=[...])`. Tracking is Tier 3: wrap `invoke_async` with `tracker.track_duration_of(...)` and record tokens from `result.metrics.accumulated_usage`. See [agent-mode-frameworks.md § Strands Agent](references/agent-mode-frameworks.md) and [strands-tracking.md](../aiconfig-ai-metrics/references/strands-tracking.md) |
 | Strands app on TypeScript | TS SDK ships `BedrockModel` and `OpenAIModel` only — cannot serve Anthropic-backed variations. Use the Python SDK if multi-provider variations are required |
 | TypeScript app using Anthropic SDK | No `trackAnthropicMetrics` helper exists. Use Tier 3: `trackMetricsOf` with a small custom extractor that reads `response.usage.input_tokens` / `response.usage.output_tokens` and returns `LDAIMetrics`. See [anthropic-tracking.md](../aiconfig-ai-metrics/references/anthropic-tracking.md) in the `aiconfig-ai-metrics` skill for the exact extractor |