Regular 500 responses from `/logs3`

(Apologies if this isn't the right forum to report ~production issues)

Our eval pipeline is regularly seeing 500 errors reported from the `/logs3` endpoint when running evals:
```
log request failed. Elapsed time: 4.979 seconds. Payload size: 979.
Error: 500 (Internal Server Error): {"Code":"InternalServerError","InternalTraceId":"69a146240000000016ef77ef2c5f16ed","Path":"/logs3","Service":"api"}
Sleeping for 1s
log request failed. Elapsed time: 3.2 seconds. Payload size: 339.
Sleeping for 1s
Error: 500 (Internal Server Error): {"Code":"InternalServerError","InternalTraceId":"69a146270000000006d6dba09f562bc5","Path":"/logs3","Service":"api"}
log request failed. Elapsed time: 2.165 seconds. Payload size: 339.
Sleeping for 1s
Error: TypeError: fetch failed
log request failed. Elapsed time: 2.163 seconds. Payload size: 1322.
Sleeping for 1s
Error: TypeError: fetch failed
log request failed. Elapsed time: 2.16 seconds. Payload size: 339.
```

It looks like this is causing retries and adding latency. This seems to happen much more often when our evals are very fast (I'm experimenting with adding a caching layer to our LLM calls, which increases the rate of these errors). Could we be overloading a logs server?

Note: We're also experimenting with parallelizing logging, since we found that the switched to serialized logging in eval created a large bottleneck (making our evals take 300% longer): details in https://github.com/braintrustdata/braintrust-sdk/issues/1394 which @CLowbrow is looking at.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regular 500 responses from `/logs3` #1414

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Regular 500 responses from /logs3 #1414

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Regular 500 responses from `/logs3` #1414