feat: add vLLM prefix cache and preemption metrics#2843
Conversation
/review-pr --deepBranch: feat/vllm_prefix_counters (vs main) --- Findings (scored >= 80) --- --- Filtered (scored < 80) --- LGTM — deep review found no actionable issues. All candidates were Linting and formattingAll ┌────────────────────────────────────────────────────┬────────┐ |
e3a6909 to
d86e4e3
Compare
Track prefix_cache_queries, prefix_cache_hits, prefix_cache_hit_rate, and num_preemptions as windowed deltas from vLLM cumulative counters. Add optional per-engine scalar aggregates (mean/max/p95) and a toggle to disable heavy timeline image plots in wandb logging. Signed-off-by: Puneesh Khanna <puneesh.khanna@tii.ae>
d86e4e3 to
fb0293b
Compare
What does this PR do ?
Tracks prefix_cache_queries, prefix_cache_hits, prefix_cache_hit_rate, and num_preemptions as windowed deltas from vLLM cumulative counters. Also add optional scalar aggregates (mean/max/p95) and a toggle to disable heavy timeline image plots in wandb logging.
Issues
NA
Usage
Add below config parameters to your yaml:
This gives you prefix cache hit rate, preemption counts, generation tokens, KV cache usage, etc. as lightweight scalar line charts in W&B, without the heavy image plots.
Before your PR is "Ready for review"
Pre checks:
Additional Information
I think that no new tests are required to cover this.

Wandb plots over a single step on a small model Qwen3-1.7B: