fix(eval overview): hide non-output metrics for evaluator steps#3897
fix(eval overview): hide non-output metrics for evaluator steps#3897
Conversation
Filter evaluator overview metrics by output namespaces so annotation infra metrics (duration, cost, tokens, errors) are not displayed as evaluator metrics.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
🟡 normalizeMetricPath produces paths for outputs. prefix that isEvaluatorOutputMetric will always reject
When normalizeMetricPath receives a path starting with outputs. (e.g., "outputs.score"), it produces "attributes.ag.outputs.score". The new isEvaluatorOutputMetric filter then checks this against EVALUATOR_OUTPUT_PATH_PREFIXES, but none of them match "attributes.ag.outputs." — note the missing data. segment.
Root Cause
At evaluatorMetrics.ts:161, normalizeMetricPath maps outputs.X → attributes.ag.outputs.X:
if (trimmed.startsWith("outputs.")) return `attributes.ag.${trimmed}`But the EVALUATOR_OUTPUT_PATH_PREFIXES at evaluatorMetrics.ts:30-35 does not include "attributes.ag.outputs." — it only includes "attributes.ag.data.outputs." (with the data. segment). So isEvaluatorOutputMetric("attributes.ag.outputs.score") returns false, and the metric is silently dropped at line 179.
Compare with the data. prefix handling at line 160: normalizeMetricPath("data.outputs.score") → "attributes.ag.data.outputs.score" which correctly passes the filter.
This inconsistency means any evaluator definition whose metric path starts with outputs. (e.g., "outputs.score") will have that metric silently excluded from fallback metrics.
Impact: In practice, the standard extractMetrics flow (evaluators.ts:87-100) produces bare key names like "score" which hit the default branch of normalizeMetricPath and correctly get prefixed with attributes.ag.data.outputs.. So this bug would only manifest if an evaluator definition provides a metric path explicitly prefixed with outputs., which is a supported but apparently uncommon code path in normalizeMetricPath.
(Refers to line 161)
Was this helpful? React with 👍 or 👎 to provide feedback.
Railway Preview Environment
|
Summary
attributes.ag.data.outputs.*and normalized equivalents)Testing