fix(eval overview): hide non-output metrics for evaluator steps by mmabrouk · Pull Request #3897 · Agenta-AI/agenta

mmabrouk · 2026-03-03T17:10:47Z

Summary

Restrict evaluator metrics in Overview to evaluator output namespaces (attributes.ag.data.outputs.* and normalized equivalents)
Filter both live run metrics and fallback evaluator metric definitions using the same namespace check
Prevent annotation infra metrics (duration, cost, tokens, errors) from showing as evaluator metrics in the Overview section

Testing

Not run (frontend-only filtering change)

Filter evaluator overview metrics by output namespaces so annotation infra metrics (duration, cost, tokens, errors) are not displayed as evaluator metrics.

vercel · 2026-03-03T17:10:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Mar 5, 2026 0:09am

devin-ai-integration

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-03-03T17:14:52Z

web/oss/src/components/EvalRunDetails/components/views/OverviewView/utils/evaluatorMetrics.ts

🟡 normalizeMetricPath produces paths for outputs. prefix that isEvaluatorOutputMetric will always reject

When normalizeMetricPath receives a path starting with outputs. (e.g., "outputs.score"), it produces "attributes.ag.outputs.score". The new isEvaluatorOutputMetric filter then checks this against EVALUATOR_OUTPUT_PATH_PREFIXES, but none of them match "attributes.ag.outputs." — note the missing data. segment.

Root Cause

At evaluatorMetrics.ts:161, normalizeMetricPath maps outputs.X → attributes.ag.outputs.X:

if (trimmed.startsWith("outputs.")) return `attributes.ag.${trimmed}`

But the EVALUATOR_OUTPUT_PATH_PREFIXES at evaluatorMetrics.ts:30-35 does not include "attributes.ag.outputs." — it only includes "attributes.ag.data.outputs." (with the data. segment). So isEvaluatorOutputMetric("attributes.ag.outputs.score") returns false, and the metric is silently dropped at line 179.

Compare with the data. prefix handling at line 160: normalizeMetricPath("data.outputs.score") → "attributes.ag.data.outputs.score" which correctly passes the filter.

This inconsistency means any evaluator definition whose metric path starts with outputs. (e.g., "outputs.score") will have that metric silently excluded from fallback metrics.

Impact: In practice, the standard extractMetrics flow (evaluators.ts:87-100) produces bare key names like "score" which hit the default branch of normalizeMetricPath and correctly get prefixed with attributes.ag.data.outputs.. So this bug would only manifest if an evaluator definition provides a metric path explicitly prefixed with outputs., which is a supported but apparently uncommon code path in normalizeMetricPath.

(Refers to line 161)

Was this helpful? React with 👍 or 👎 to provide feedback.

github-actions · 2026-03-03T17:25:41Z

Railway Preview Environment


Preview URL	https://gateway-production-9925.up.railway.app/w
Project	`agenta-oss-pr-3897`
Image tag	`pr-3897-35873c3`
Status	Deployed
Railway logs	Open logs
Workflow logs	View workflow run
Updated at 2026-03-05T12:16:18.448Z

fix(eval-overview): only show evaluator output metrics

24867e8

Filter evaluator overview metrics by output namespaces so annotation infra metrics (duration, cost, tokens, errors) are not displayed as evaluator metrics.

dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Mar 3, 2026

vercel bot deployed to Preview March 3, 2026 17:11 View deployment

dosubot bot added Evaluation Frontend labels Mar 3, 2026

devin-ai-integration bot reviewed Mar 3, 2026

View reviewed changes

mmabrouk requested review from ardaerzin March 3, 2026 22:46

Merge branch 'main' into fix/evaluator-overview-output-metrics

35134d7

vercel bot deployed to Preview March 5, 2026 09:48 View deployment

Merge branch 'main' into fix/evaluator-overview-output-metrics

b53cfc7

vercel bot deployed to Preview March 5, 2026 12:09 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval overview): hide non-output metrics for evaluator steps#3897

fix(eval overview): hide non-output metrics for evaluator steps#3897
mmabrouk wants to merge 3 commits intomainfrom
fix/evaluator-overview-output-metrics

mmabrouk commented Mar 3, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

vercel bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mmabrouk commented Mar 3, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

vercel bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mmabrouk commented Mar 3, 2026 •

edited by devin-ai-integration bot

Loading

vercel bot commented Mar 3, 2026 •

edited

Loading

github-actions bot commented Mar 3, 2026 •

edited

Loading