Skip to content
4 changes: 2 additions & 2 deletions skills.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"path": "skills/ai-configs/aiconfig-ai-metrics",
"version": "1.0.0-experimental",
"license": "Apache-2.0",
"compatibility": "Requires the LaunchDarkly server-side AI SDK (`launchdarkly-server-sdk-ai>=0.18.0` for Python or `@launchdarkly/server-sdk-ai>=0.17.0` for Node) and an existing AI Config."
"compatibility": "Requires the LaunchDarkly server-side AI SDK (`launchdarkly-server-sdk-ai>=0.20.0` for Python or `@launchdarkly/server-sdk-ai>=0.20.0` for Node) and an existing AI Config."
},
{
"name": "aiconfig-create",
Expand Down Expand Up @@ -45,7 +45,7 @@
"description": "Attach judges to AI Config variations for automatic LLM-as-a-judge evaluation. Create custom judges, configure sampling rates, and monitor quality scores.",
"path": "skills/ai-configs/aiconfig-online-evals",
"version": "0.1.0",
"compatibility": "Requires LaunchDarkly API access token with ai-configs:write permission. SDK versions Python v0.18.0+ or Node.js v0.17.0+ for automatic metric recording and the consolidated `track_judge_result` / `trackJudgeResult` API."
"compatibility": "Requires LaunchDarkly API access token with ai-configs:write permission. SDK versions Python v0.20.0+ or Node.js v0.20.0+ for automatic metric recording and the consolidated `track_judge_result` / `trackJudgeResult` API."
},
{
"name": "aiconfig-projects",
Expand Down
22 changes: 12 additions & 10 deletions skills/ai-configs/aiconfig-ai-metrics/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name: aiconfig-ai-metrics
description: "Instrument an existing codebase with LaunchDarkly AI Config tracking. Walks the four-tier ladder (managed runner → provider package → custom extractor + trackMetricsOf → raw manual) and picks the lowest-ceremony option that still captures duration, tokens, and success/error."
license: Apache-2.0
compatibility: Requires the LaunchDarkly server-side AI SDK (`launchdarkly-server-sdk-ai>=0.18.0` for Python or `@launchdarkly/server-sdk-ai>=0.17.0` for Node) and an existing AI Config.
compatibility: Requires the LaunchDarkly server-side AI SDK (`launchdarkly-server-sdk-ai>=0.20.0` for Python or `@launchdarkly/server-sdk-ai>=0.20.0` for Node) and an existing AI Config.
metadata:
author: launchdarkly
version: "1.0.0-experimental"
Expand All @@ -20,12 +20,12 @@ This is the order the official SDK READMEs (Python core, Node core, and every pr

| Tier | Pattern | Use when | Tracks automatically |
|------|---------|----------|----------------------|
| **1 — Managed runner** | Python: `ai_client.create_model(...)` returning a `ManagedModel`, then `await model.invoke(...)`. <br>Node: `aiClient.initChat(...)` / `aiClient.createChat(...)` returning a `TrackedChat`, then `await chat.invoke(...)`. | The call is conversational (chat history, turn-based). This is what the provider READMEs lead with. | Duration, tokens, success/error — **all of it, zero tracker calls**. |
| **1 — Managed runner** | Python: `ai_client.create_model(...)` returning a `ManagedModel`, then `await model.run(...)`. <br>Node: `aiClient.createModel(...)` returning a `ManagedModel`, then `await model.run(...)`. | The call is conversational (chat history, turn-based). This is what the provider READMEs lead with. | Duration, tokens, success/error — **all of it, zero tracker calls**. |
| **2 — Provider package + `trackMetricsOf`** | `tracker.trackMetricsOf(Provider.getAIMetricsFromResponse, () => providerCall())`. Provider packages today: `@launchdarkly/server-sdk-ai-openai`, `-langchain`, `-vercel` (Node) and `launchdarkly-server-sdk-ai-openai`, `-langchain` (Python). | The shape isn't a chat loop (one-shot completion, structured output, agent step) but the framework or provider has a package. | Duration + success/error from the wrapper; tokens from the package's built-in `getAIMetricsFromResponse` extractor. |
| **3 — Custom extractor + `trackMetricsOf`** | Same `trackMetricsOf` wrapper, but you write a small function that maps the provider response to `LDAIMetrics` (tokens + success). | No provider package exists (Anthropic direct, Gemini, Cohere, custom HTTP). | Duration + success/error from the wrapper; tokens from your extractor. |
| **4 — Raw manual** | Separate calls to `trackDuration`, `trackTokens`, `trackSuccess` / `trackError`, plus `trackTimeToFirstToken` for streams. | Streaming with TTFT, unusual response shapes, partial tracking, anything Tier 2–3 can't cleanly wrap. | Only what you explicitly call — it's on you to not miss one. |

A call to `track_openai_metrics` / `trackOpenAIMetrics` / `track_bedrock_converse_metrics` / `trackBedrockConverseMetrics` / `trackVercelAISDKGenerateTextMetrics` is **Tier-2 legacy shorthand**. These helpers still exist in the SDK source but none of the current provider READMEs use them — they've been superseded by `trackMetricsOf` + `Provider.getAIMetricsFromResponse`. Do not recommend them for new code; if you see them in an existing codebase, leave them alone unless the user is already on a cleanup pass.
Every provider — OpenAI, LangChain, Vercel, Bedrock, Anthropic, Gemini, custom HTTP — uses the same generic shape: `tracker.trackMetricsOf(getAIMetricsFromResponse, () => providerCall())` in Node, `tracker.track_metrics_of(get_ai_metrics_from_response, provider_call)` in Python. The extractor is the only thing that changes per provider: import `getAIMetricsFromResponse` from the matching `@launchdarkly/server-sdk-ai-<provider>` (or `ldai_<provider>`) package, or write a small custom function that returns `LDAIMetrics`. There are no provider-specific tracker methods.

## Workflow

Expand All @@ -38,7 +38,12 @@ Before picking a tier, find the provider call and answer these questions:
- [ ] **Provider?** OpenAI, Anthropic, Bedrock, Gemini, Azure, custom HTTP? → cross-reference with the package availability matrix below.
- [ ] **Streaming?** If yes, you'll need TTFT tracking, which means Tier 4 for the TTFT part even if the rest is Tier 2.
- [ ] **Language?** Python or Node? Provider-package coverage differs between them.
- [ ] **Already using an AI Config?** If not, route to `aiconfig-create` first — tracking requires a tracker, which is obtained by calling `create_tracker()` / `createTracker()` on the config object returned by `completion_config()` / `completionConfig()` / `initChat()`.
- [ ] **Already using an AI Config?** If not, route to `aiconfig-create` first — tracking requires a tracker, which is obtained by calling `create_tracker()` / `createTracker()` on the config object returned by `completion_config()` / `completionConfig()` / `createModel()`.
- [ ] **On the current SDK API?** If the call site uses `aiclient.config(...)` / `aiClient.config(...)` or constructs an `AIConfig(...)` / `LDAIConfig` default, it's on the pre-0.20 surface. Migrate it as part of this work before adding tracking:
- `aiclient.config(...)` → `aiclient.completion_config(...)` for one-shot/chat or `aiclient.agent_config(...)` for agent mode (mirror the call signature). Node is the same with camelCase.
- `AIConfig(...)` default → `AICompletionConfigDefault(...)` or `AIAgentConfigDefault(...)` (Node: `LDAICompletionConfigDefault` / `LDAIAgentConfigDefault`). `AIConfig` is the base class the SDK returns; it isn't a valid default-value constructor — the typed `*Default` variants are.
- If the result was being tuple-unpacked (`config, tracker = aiclient.config(...)`), drop the unpack — the new methods return a single config object. Obtain the tracker via `config.create_tracker()` / `aiConfig.createTracker()`.
- For deeper rewrites (call sites with hardcoded model/prompt as well), hand off to `aiconfig-migrate` instead of doing the full migration here.

### 2. Look up your Tier-2 option

Expand Down Expand Up @@ -78,7 +83,7 @@ Confirm the Monitoring tab fills in:

## Quick reference: tracker methods

Obtain a tracker via the factory on the config object: `tracker = config.create_tracker()` (Python v0.18.0+) or `const tracker = aiConfig.createTracker!()` (Node v0.17.0+). Call the factory once per execution and reuse the returned `tracker` for every call — each factory invocation mints a new `runId` that tags every tracking event emitted by that tracker so events from a single execution can be correlated together (via exported events / downstream systems). The Monitoring tab aggregates events rather than grouping them by run today — the `runId` is useful when events are exported or queried outside the UI, and is the identifier the SDK's at-most-once guards are keyed on. The methods below are the raw API surface — most of the time you should not call them individually; use `trackMetricsOf` or a Tier-1 managed runner. The list is here so you can recognize the methods in existing code and reach for the right one when you genuinely need Tier 4.
Obtain a tracker via the factory on the config object: `tracker = config.create_tracker()` (Python) or `const tracker = aiConfig.createTracker()` (Node). Call the factory once per execution and reuse the returned `tracker` for every call — each factory invocation mints a new `runId` that tags every tracking event emitted by that tracker so events from a single execution can be correlated together (via exported events / downstream systems). The Monitoring tab aggregates events rather than grouping them by run today — the `runId` is useful when events are exported or queried outside the UI, and is the identifier the SDK's at-most-once guards are keyed on. The methods below are the raw API surface — most of the time you should not call them individually; use `trackMetricsOf` or a Tier-1 managed runner. The list is here so you can recognize the methods in existing code and reach for the right one when you genuinely need Tier 4.

| Method (Python ↔ Node) | Tier | What it does |
|---|---|---|
Expand All @@ -92,12 +97,9 @@ Obtain a tracker via the factory on the config object: `tracker = config.create_
| `track_success()` / `trackSuccess()` | 4 | Mark the generation as successful. Required for the Monitoring tab to count it. |
| `track_error()` / `trackError()` | 4 | Mark the generation as failed. Do not also call `trackSuccess()` in the same request. |
| `track_feedback({kind})` / `trackFeedback({kind})` | any | Record thumbs-up / thumbs-down from a feedback UI. Independent of the success/error path. |
| `track_tool_call(name)` / `trackToolCall(name)` | any | Record a single tool invocation by name. Available on both SDKs as of Python v0.18.0 / Node v0.17.0. |
| `track_tool_call(name)` / `trackToolCall(name)` | any | Record a single tool invocation by name. Available on both SDKs. |
| `track_tool_calls([names])` / `trackToolCalls([names])` | any | Batch variant — record a list of tool invocations in one call. |
| `track_judge_result(result)` / `trackJudgeResult(result)` | any | Record a programmatic judge evaluation (consolidates the earlier `track_eval_scores` + `track_judge_response` pair). `result.sampled` indicates whether evaluation ran. |
| `track_openai_metrics(fn)` / `trackOpenAIMetrics(fn)` | **legacy** | Predates provider packages. Still works; do not use in new code. Replace with `trackMetricsOf(OpenAIProvider.getAIMetricsFromResponse, fn)`. |
| `track_bedrock_converse_metrics(res)` / `trackBedrockConverseMetrics(res)` | **legacy** | Same story. Do not use in new code. |
| `trackVercelAISDKGenerateTextMetrics(fn)` (Node) | **legacy** | Same story. Use `trackMetricsOf` with the Vercel provider package's extractor. |
| `track_judge_result(result)` / `trackJudgeResult(result)` | any | Record a programmatic judge evaluation. `result.sampled` indicates whether evaluation ran. |

## Related skills

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@

Three viable paths, in order of preference:

1. **Route Anthropic through LangChain.** If the app already uses LangChain (or can adopt it cheaply), install the LangChain provider package and use it as Tier 2. LangChain's `ChatAnthropic` wrapper exposes the standardized `usage_metadata` that `LangChainProvider.getAIMetricsFromResponse` reads.
1. **Route Anthropic through LangChain.** If the app already uses LangChain (or can adopt it cheaply), install the LangChain provider package and use it as Tier 2. LangChain's `ChatAnthropic` wrapper exposes the standardized `usage_metadata` that `getAIMetricsFromResponse` reads.
2. **Route Anthropic through Bedrock Converse.** If the app can switch to Bedrock Converse (Claude is available on Bedrock), you inherit Bedrock's Converse response shape and a custom-extractor pattern that's slightly cleaner. See [bedrock-tracking.md](bedrock-tracking.md).
3. **Custom extractor on the direct SDK** (this file's primary pattern).

## Tier 1 is not available

`ManagedModel` / `TrackedChat` do not currently ship an Anthropic provider. If you need Tier 1 for a chat app, use option 1 or 2 above — the LangChain provider package lets `ManagedModel` wrap a `ChatAnthropic` under the hood, which restores the zero-tracker-call experience.
`ManagedModel` does not currently ship an Anthropic provider. If you need Tier 1 for a chat app, use option 1 or 2 above — the LangChain provider package lets `ManagedModel` wrap a `ChatAnthropic` under the hood, which restores the zero-tracker-call experience.

## Tier 3 — Custom extractor + `trackMetricsOf` (primary)

Expand All @@ -25,7 +25,7 @@ client = anthropic.Anthropic()
def anthropic_extractor(response) -> LDAIMetrics:
return LDAIMetrics(
success=True,
usage=TokenUsage(
tokens=TokenUsage(
total=response.usage.input_tokens + response.usage.output_tokens,
input=response.usage.input_tokens,
output=response.usage.output_tokens,
Expand All @@ -52,7 +52,7 @@ def call_with_tracking(ai_config, user_prompt: str) -> str | None:
# except: tracker.track_error() on top — it's a noop that trips the
# at-most-once guard. Wrap in your own try/except only if you need
# local handling (logging, fallback, alert); the error is already tracked.
response = tracker.track_metrics_of(call_anthropic, anthropic_extractor)
response = tracker.track_metrics_of(anthropic_extractor, call_anthropic)
return response.content[0].text
```

Expand All @@ -66,7 +66,7 @@ const client = new Anthropic();

const anthropicExtractor = (response: Anthropic.Message): LDAIMetrics => ({
success: true,
usage: {
tokens: {
total: response.usage.input_tokens + response.usage.output_tokens,
input: response.usage.input_tokens,
output: response.usage.output_tokens,
Expand All @@ -81,7 +81,7 @@ async function callWithTracking(

const systemContent = aiConfig.messages?.[0]?.content ?? '';

const tracker = aiConfig.createTracker!();
const tracker = aiConfig.createTracker();
// Exceptions are tracked automatically: trackMetricsOf catches exceptions,
// records tracker.trackError(), and re-throws. Do NOT add
// catch (err) { tracker.trackError(); throw err } on top — it's a noop
Expand All @@ -108,18 +108,18 @@ Notes on the extractor shape:

## Tier 2 option — route via LangChain

If the app can adopt LangChain, the LangChain provider package handles Anthropic (via `@langchain/anthropic`) through the same `trackMetricsOf(LangChainProvider.getAIMetricsFromResponse, ...)` pattern used for any other LangChain model. This is often the cleanest answer if the app already uses or is open to LangChain, because the extractor is built in and shared with every other LangChain-wrapped model.
If the app can adopt LangChain, the LangChain provider package handles Anthropic (via `@langchain/anthropic`) through the same `trackMetricsOf(getAIMetricsFromResponse, ...)` pattern used for any other LangChain model. This is often the cleanest answer if the app already uses or is open to LangChain, because the extractor is built in and shared with every other LangChain-wrapped model.

```python
from ldai_langchain import LangChainProvider
from ldai_langchain import create_langchain_model, get_ai_metrics_from_response

ai_config = ai_client.completion_config("my-config-key", context, default_config)
llm = await LangChainProvider.create_langchain_model(ai_config) # ChatAnthropic under the hood
llm = create_langchain_model(ai_config) # ChatAnthropic under the hood

tracker = ai_config.create_tracker()
response = tracker.track_metrics_of(
get_ai_metrics_from_response,
lambda: llm.invoke(messages),
LangChainProvider.get_ai_metrics_from_response,
)
```

Expand Down
Loading
Loading