Expose trace interface in scorers by Qard · Pull Request #99 · braintrustdata/braintrust-sdk-ruby

Qard · 2026-02-10T18:59:47Z

Only possibly controversial thing in here is exposing the btql API publicly when it only supports AST/object mode--there's no builder yet.

Refs braintrustdata/braintrust-sdk#1060

CLowbrow · 2026-02-11T22:54:48Z

lib/braintrust/trace_context.rb

+    # Reconstruct message thread from LLM spans.
+    # Deduplicates input messages by content hash, always includes output messages.
+    # @return [Array<Hash>] Array of message hashes
+    def get_thread


I don't think we want to do it like this anymore. The way these are generated in the main SDK and in the data plane is by calling invoke on a preprocessor:

https://github.com/braintrustdata/braintrust-sdk/blob/main/js/src/trace.ts#L381

https://github.com/braintrustdata/braintrust/blob/main/api-ts/src/wrapper/trace.ts#L122

I don't know if we have support for these invoke calls in this or other non-main repo SDKs so we might want to skip this method for now?

delner

This adds significant complexity in that it:

Adds caching to global state
Adds multi-threading/synchronization behavior between workers
Couples tracing to Evals and reaches deeply into internals in the process

Many of these things have potential for significant impact on performance and stability, as well as scaling the software without harming the prior two categories. Before we merge, I would like to review the requirements, design goals, and patterns more thoroughly. Let's talk.

lib/braintrust/api/btql.rb

delner · 2026-02-13T14:50:07Z

lib/braintrust/eval/runner.rb

+      # Create a TraceContext for scorers to access span data
+      # @param eval_span [OpenTelemetry::Trace::Span] The eval span
+      # @return [TraceContext]
+      def create_trace_context(eval_span)


I don't think this belongs here: Eval::Runner concerns running evals, not building trace contexts.

delner · 2026-02-13T14:51:56Z

lib/braintrust/eval/scorer.rb

      # @return [Float, Hash, Array] Score value(s)
-      def call(input, expected, output, metadata = {})
-        @wrapped_callable.call(input, expected, output, metadata)
+      def call(input, expected, output, metadata = {}, trace = nil)


Why does a scorer take a trace context?

lib/braintrust/trace/span_processor.rb

lib/braintrust/eval.rb

lib/braintrust/span_cache.rb

lib/braintrust/state.rb

delner

Per our conversation yesterday with @CLowbrow, we should not use a span cache for the purposes of piping trace content between tracing and evals locally because:

Spans may be mutated once they reach Braintrust (become stale locally)
Distributed tracing means the set of spans will be incomplete (need to query the API anyways)
Eval for the trace may not run locally at all (wasted memory)
Performance impact of caching spans is potentially high
This creates a hard coupling between tracing and evals (which are meant to be severable)
The overall complexity of the feature makes it difficult to reason about, refactor, and may affect the stability of the SDK.

Per my analysis, we should:

Remove write_to_cache from SpanProcessor entirely. The tracer generates and submits spans via OTel only (no local buffering for Evals)
Remove SpanRegistry — it only exists to bridge Trace → Eval.
TraceContext.get_spans should always go through BTQL. It can cache the BTQL response locally, but that cache is populated by API results, not by the tracer.
Eval::Context should not own a SpanCache that the tracer writes to. If caching is needed, it should be scoped to TraceContext and populated from BTQL only.
eval.rb should not import anything from trace/.

delner · 2026-02-19T13:18:20Z

lib/braintrust/eval/runner.rb


          # Run scorers
+          # Create TraceContext for scorers (if scorers exist)
+          trace = scorers.empty? ? nil : create_trace_context(eval_span)


When would scorers not exist? (I would think that if you run an Eval you would want to score it?)

Might not be using local code scorers though, right? Could just be getting scored in our backend.

Qard requested review from clutchski and delner February 10, 2026 18:59

Qard self-assigned this Feb 10, 2026

Qard added the enhancement New feature or request label Feb 10, 2026

Qard force-pushed the trace-in-scorer branch 3 times, most recently from dd33e0e to 29d05d5 Compare February 10, 2026 19:24

CLowbrow reviewed Feb 11, 2026

View reviewed changes

delner requested changes Feb 13, 2026

View reviewed changes

Qard force-pushed the trace-in-scorer branch from 81e4512 to 267357e Compare February 17, 2026 06:54

delner requested changes Feb 19, 2026

View reviewed changes

Qard added 3 commits February 24, 2026 03:00

Expose trace interface in scorers

86efdfd

Refactor to use thread locals

81c9269

Eliminate span cache and refactor

a77a931

Qard force-pushed the trace-in-scorer branch from 0f19d0b to a77a931 Compare February 23, 2026 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Expose trace interface in scorers#99

Expose trace interface in scorers#99
Qard wants to merge 3 commits intomainfrom
trace-in-scorer

Qard commented Feb 10, 2026 •

edited

Loading

Uh oh!

CLowbrow Feb 11, 2026

Uh oh!

delner left a comment

Uh oh!

Uh oh!

delner Feb 13, 2026

Uh oh!

delner Feb 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

delner left a comment

Uh oh!

delner Feb 19, 2026

Uh oh!

Qard Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

Qard commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLowbrow Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

delner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

delner Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

delner Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

delner left a comment

Choose a reason for hiding this comment

Uh oh!

delner Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Qard Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Qard commented Feb 10, 2026 •

edited

Loading