Skip to content

Significant latency regression between TypeScript SDKs v0.4.10 and v1.0.0 #1394

@CodingCanuck

Description

@CodingCanuck

I've just upgrade our codebase from braintrust v0.4.10 to braintrust v1.0.0, which caused our eval runs to get significantly slower (from 1.5 min before to 5 min after for a few hundred tasks).

The majority of the time seems to be spent uploading logs to a /logs3 API endpoint in a sequential upload loop here that was introduced in this commit (and later re-done in this commit).

It looks like the goal of these commits was to reduce memory usage, but it looks like it inadvertently changed blocking behavior (I think logging previously wasn't blocked on / awaited, but now the code seemingly awaits these ~small sequential RPCs after each task). (I don't know Braintrust's code, but @CLowbrow is likely familiar with the tradeoffs that were optimized for here)

With Codex's help, I've created a little benchmark script that reproduces the regression without a dependency on our internal repo, along with an explanation of possible root causes: https://github.com/CodingCanuck/braintrust-sdk/blob/latency-regression-in-1.0/js/repro/latency-regression/BUG_REPORT.md

The main symptom we're noticing is a lot of sequential upload calls, whose wall time corresponds to the increase in wall time of our eval runs. Unfortunately there's no change in latency when setting noSendLogs to true, but my local eval time drops from 5 min to 77s if I patch out logging/flush functions to be no-ops:

    const state = _internalGetGlobalState();
    const httpLogger = state.httpLogger() as unknown as {
      flush?: () => Promise<void>;
      flushOnce?: (args?: unknown) => Promise<void>;
      submitLogsRequest?: (...args: unknown[]) => Promise<unknown>;
      triggerActiveFlush?: () => void;
    };
    if (httpLogger) {
      if (typeof httpLogger.submitLogsRequest === 'function') {
        httpLogger.submitLogsRequest = async () => undefined;
      }
      if (typeof httpLogger.flushOnce === 'function') {
        httpLogger.flushOnce = async () => undefined;
      }
      if (typeof httpLogger.flush === 'function') {
        httpLogger.flush = async () => undefined;
      }
      if (typeof httpLogger.triggerActiveFlush === 'function') {
        httpLogger.triggerActiveFlush = () => undefined;
      }

I see the same slowdown in all releases after v0.4.10 (including the latest v3.0.0 release).

For our use case, we're not at all concerned with memory usage, but we're very concerned with end-to-end latency for fast iteration when experimenting with prompt changes. Could we fix this latency regression without breaking the intended memory optimization? We'd happily run with a --fast-but-ram-hungry flag or similar.

Note: we're directly executing Eval instances and using Braintrust's TypeScript SDK: we're not using Braintrust's CLI (we're not running npx braintrust).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions