[FEAT] Add replay from trace strategy by VincentG1234 · Pull Request #620 · vllm-project/guidellm

VincentG1234 · 2026-03-04T13:30:30Z

Summary

Add a new replay benchmarking strategy that reproduces real-world request patterns from trace log files (.jsonl)
Enable time-based request rate replay with precise timestamp scheduling
Support synthetic prompt generation that matches token counts from trace files
use max_requests and max_seconds cli options to limit the number of requests processed from a trace

Motivation

This change addresses issue #597 by enabling users to benchmark their vLLM servers using real production traces. Instead of synthetic load patterns, users can now replay exact request arrival times and token distributions from their actual workloads for more realistic performance testing.

Changes

Add TraceReplayStrategy scheduler strategy for timestamp-based request dispatching
Add ReplayProfile class for configuring trace-based benchmarking parameters
Add TraceSyntheticDatasetDeserializer to generate prompts matching trace input/output lengths
Add TraceReader utility for reading .jsonl trace files with timestamp, input_length, output_length fields
Update Entrypoint to handle replay profile and dataset configuration
use max_requests and max_seconds truncation support to limit trace replay length

Testing

pytest tests/unit/scheduler/test_trace_replay.py (pass)
pytest tests/unit/benchmark/test_replay_profile.py (pass)
pytest tests/unit/data/deserializers/test_trace_synthetic.py (pass)
Added tests: scheduling accuracy, boundary conditions, malformed trace handling, empty trace cases, max_requests truncation
test it in practice quickly with NB COLAB

Next Steps (this PR)

Apply reviewer feedback
Add E2E tests verifying end-to-end trace replay flow ✅
Add integrations tests (if needed)
Add CLI usage examples in PR description and docs ✅

Out of Scope (future PRs or not)

Mooncake trace format support (token-level traces)
Helper utilities for timestamp format conversions (Unix epoch, ISO8601, relative timestamps)
Support for request payload traces (not just token counts)
Trace file validation and schema verification tools
Performance optimizations for large trace files (streaming, chunked processing)
Metrics export formatted for trace analysis comparison
Support for trace file compression formats (.gz, .bz2)

mergify · 2026-03-18T11:30:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @VincentG1234.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Add trace replay capability to GuideLLM for reproducing real-world request patterns from trace files. This enables time-based request rate replay and synthetic prompt generation matching trace token counts. - Add TraceReplayStrategy for scheduling requests at precise timestamps - Add ReplayProfile for configuring trace-based benchmarking - Add TraceSyntheticDatasetDeserializer for generating prompts from traces - Support max_requests truncation to limit trace length This is a minimal implementation to address issue 597. Full Mooncake format support, E2E tests, and documentation will follow in subsequent PRs. Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

hgsmn · 2026-03-30T09:51:57Z

It will be great to get an example of "How to get the JSONL" because i don't find solutions in litellm for example.

VincentG1234 · 2026-04-02T16:01:39Z

Yeah that’s true, most frameworks won’t produce this exact JSONL directly.

That’s kind of intentional. The idea here is to define a minimal, framework-agnostic canonical replay format, not something tied to a specific tracing stack.

In practice, the required fields already exist almost everywhere (timestamp, input token count, output token count), just under slightly different names, so a small mapping step is usually enough.

I agree it’s not the best UX on its own, but it felt like the right minimal base for the feature. Then we can iterate on top of it with helpers / converters for common sources like LiteLLM or Langfuse. And we can extend it later (e.g. optional prompt field, multiple timestamp formats, richer metadata) without breaking the core idea.

But happy to adjust the direction if maintainers prefer something more opinionated or integrated from the start.

sjmonson

Sorry for the silence on this. There are a few things with this PR that break other use-cases. I am still working on a more complete review but here are a few low hanging problems.

sjmonson · 2026-03-18T14:19:47Z

Since this file is used by data, scheduler, and benchmark move it to utils.

sjmonson · 2026-03-18T14:20:57Z

    SynchronousStrategy,
    ThroughputStrategy,
+    TraceReplayStrategy,
+    load_relative_timestamps,


Not a method of this submodule.

Suggested change

load_relative_timestamps,

sjmonson · 2026-03-18T14:21:04Z

    "UnserializableConstraintInitializer",
    "WorkerProcess",
    "WorkerProcessGroup",
+    "load_relative_timestamps",


Not a method of this submodule.

Suggested change

"load_relative_timestamps",

sjmonson · 2026-03-18T14:21:13Z

    "SynchronousStrategy",
    "ThroughputStrategy",
+    "TraceReplayStrategy",
+    "load_relative_timestamps",


Not a method of this submodule.

Suggested change

"load_relative_timestamps",

sjmonson · 2026-03-18T14:24:22Z


 from pydantic import Field, NonNegativeFloat, NonNegativeInt, PositiveInt, PrivateAttr

+from guidellm.data.trace_io import load_relative_timestamps


See comment on trace_io.py

Suggested change

from guidellm.data.trace_io import load_relative_timestamps

from guidellm.utils.trace_io import load_relative_timestamps

sjmonson · 2026-03-24T20:58:11Z

+    # When max_requests is set, limit the first data source to that many rows at load
+    if max_requests is not None and data:
+        if max_requests < 1:
+            raise ValueError(
+                "max_requests must be >= 1 when set for data truncation, "
+                f"got {max_requests}"
+            )
+        data_args = list(data_args) if data_args else [{} for _ in data]
+        if len(data_args) >= 1:
+            data_args[0] = {**data_args[0], "max_rows": max_requests}
+


Drop this, max_requests is a constraint on the number of requests that are allowed to complete. To limit the data source use --data-samples.

Suggested change

# When max_requests is set, limit the first data source to that many rows at load

if max_requests is not None and data:

if max_requests < 1:

raise ValueError(

"max_requests must be >= 1 when set for data truncation, "

f"got {max_requests}"

)

data_args = list(data_args) if data_args else [{} for _ in data]

if len(data_args) >= 1:

data_args[0] = {**data_args[0], "max_rows": max_requests}

sjmonson · 2026-04-13T19:45:13Z

+    # For replay profile: resolve profile first to apply max_seconds filtering,
+    # then use the filtered count for the data loader. This ensures the data
+    # loader and scheduler both work with the same filtered request count.
+    if args.profile == "replay":


Unless I am missing something, this conditional should be unnecessary. There is no reason to do loader then profile other than its the way things were done before.

sjmonson · 2026-04-13T19:50:19Z

+        effective_max_requests = (
+            profile.constraints.get("max_requests")
+            if profile.constraints
+            else args.max_requests
+        )


Not a huge fan of this and I also think its unnecessary. The profile can trigger a benchmark end based on the number of requests. Its fine if the request loader reads too many requests ahead.

sjmonson · 2026-04-13T20:06:41Z

+__all__ = ["load_relative_timestamps", "load_trace_rows"]
+
+
+def load_trace_rows(


Replace this function with a call to datasets.load_dataset. Can basically be a copy of JSONFileDatasetDeserializer. I.e.

return load_dataset("json", data_files=str(path), **data_kwargs)

sjmonson · 2026-04-13T20:16:43Z

+        path,
+        required_columns=[timestamp_column],
+    )
+    timestamps = sorted([float(row[timestamp_column]) for row in raw])


if using datasets can do raw.sort(timestamp_column).

VincentG1234 · 2026-04-17T22:22:53Z

Thanks a lot for the detailed review, I really appreciate your time.

I’m fully aligned with your feedback, especially on the replay handling in the entrypoint, which is a key part of the PR. I agree that introducing a special case here is not ideal and should be avoided.

I’ll refactor this to make it cleaner and better aligned with the existing design.

VincentG1234 force-pushed the add-strategy-replay-from-trace branch from 008633f to a66034b Compare March 4, 2026 13:32

mergify bot added the needs-rebase label Mar 18, 2026

VincentG1234 added 5 commits March 18, 2026 14:09

fix the CI mypy

df2c8c3

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

add e2e tests

baf420b

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

fix ruff error CI

de40aee

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

Add trace replay documentation

780be20

Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>

VincentG1234 force-pushed the add-strategy-replay-from-trace branch from 7f893fb to 780be20 Compare March 18, 2026 13:12

mergify bot removed the needs-rebase label Mar 18, 2026

VincentG1234 marked this pull request as ready for review March 18, 2026 13:13

sjmonson self-requested a review March 18, 2026 19:18

dbutenhof added this to the v0.7.0 milestone Mar 20, 2026

sjmonson requested changes Apr 17, 2026

View reviewed changes


		from pydantic import Field, NonNegativeFloat, NonNegativeInt, PositiveInt, PrivateAttr

		from guidellm.data.trace_io import load_relative_timestamps

	from guidellm.data.trace_io import load_relative_timestamps
	from guidellm.utils.trace_io import load_relative_timestamps

		__all__ = ["load_relative_timestamps", "load_trace_rows"]


		def load_trace_rows(

Conversation

VincentG1234 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Testing

Next Steps (this PR)

Out of Scope (future PRs or not)

Uh oh!

mergify bot commented Mar 18, 2026

Uh oh!

hgsmn commented Mar 30, 2026

Uh oh!

VincentG1234 commented Apr 2, 2026

Uh oh!

sjmonson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VincentG1234 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VincentG1234 commented Mar 4, 2026 •

edited

Loading