Significant latency regression between TypeScript SDKs v0.4.10 and v1.0.0

I've just upgrade our codebase from braintrust v0.4.10 to braintrust v1.0.0, which caused our eval runs to get significantly slower (from 1.5 min before to 5 min after for a few hundred tasks).

The majority of the time seems to be spent uploading logs to a `/logs3` API endpoint in a sequential upload loop [here](https://github.com/braintrustdata/braintrust-sdk/blob/9f8937030053fc43137805b66fef5d1d335fae47/js/src/logger.ts#L2828) that was introduced in [this commit](https://github.com/braintrustdata/braintrust-sdk/commit/9860ef0921e59b2bd11767b5382f3815fff1baa0#diff-3d2ea9b6c946a03afed0967de02fcc753c92b4e66165d2823a1e4d2161c71d00) (and later re-done in [this commit](https://github.com/braintrustdata/braintrust-sdk/commit/be65342f97e4e7234a94ff6dc2e7c25f21f8660d#diff-3d2ea9b6c946a03afed0967de02fcc753c92b4e66165d2823a1e4d2161c71d00)).

It looks like the goal of these commits was to reduce memory usage, but it looks like it inadvertently changed blocking behavior (I think logging previously wasn't blocked on / `awaited`, but now the code seemingly awaits these ~small sequential RPCs after each task). (I don't know Braintrust's code, but @CLowbrow is likely familiar with the tradeoffs that were optimized for here)

With Codex's help, I've created a little benchmark script that reproduces the regression without a dependency on our internal repo, along with an explanation of possible root causes: https://github.com/CodingCanuck/braintrust-sdk/blob/latency-regression-in-1.0/js/repro/latency-regression/BUG_REPORT.md

The main symptom we're noticing is a lot of sequential upload calls, whose wall time corresponds to the increase in wall time of our eval runs. Unfortunately there's no change in latency when setting [`noSendLogs`](https://github.com/braintrustdata/braintrust-sdk/blob/9f8937030053fc43137805b66fef5d1d335fae47/js/src/framework.ts#L510) to `true`, but my local eval time drops from 5 min to 77s if I patch out logging/flush functions to be no-ops:
```js
    const state = _internalGetGlobalState();
    const httpLogger = state.httpLogger() as unknown as {
      flush?: () => Promise<void>;
      flushOnce?: (args?: unknown) => Promise<void>;
      submitLogsRequest?: (...args: unknown[]) => Promise<unknown>;
      triggerActiveFlush?: () => void;
    };
    if (httpLogger) {
      if (typeof httpLogger.submitLogsRequest === 'function') {
        httpLogger.submitLogsRequest = async () => undefined;
      }
      if (typeof httpLogger.flushOnce === 'function') {
        httpLogger.flushOnce = async () => undefined;
      }
      if (typeof httpLogger.flush === 'function') {
        httpLogger.flush = async () => undefined;
      }
      if (typeof httpLogger.triggerActiveFlush === 'function') {
        httpLogger.triggerActiveFlush = () => undefined;
      }
```

I see the same slowdown in all releases after v0.4.10 (including the latest v3.0.0 release).

For our use case, we're not at all concerned with memory usage, but we're very concerned with end-to-end latency for fast iteration when experimenting with prompt changes. Could we fix this latency regression without breaking the intended memory optimization? We'd happily run with a `--fast-but-ram-hungry` flag or similar.

Note: we're directly executing `Eval` instances and using Braintrust's TypeScript SDK: we're not using Braintrust's CLI (we're not running `npx braintrust`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant latency regression between TypeScript SDKs v0.4.10 and v1.0.0 #1394

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Significant latency regression between TypeScript SDKs v0.4.10 and v1.0.0 #1394

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions