feat(llama-cpp-localai-paged): paged KV cache llama.cpp backend + cross-request prefix sharing + GB10 decode optimization [WIP]#10462

Open

localai-bot wants to merge 321 commits into

worktree-feat+paged-attention

This pull request is big! We're only showing the most recent 250 commits

Commits on Jun 23, 2026

feat(llama-cpp): per-model max_prefill_tokens option (chunked-prefill QoS budget)
mudler
committed
docs(paged): GB10 head-to-head server sweep (llama-server vs vLLM)
mudler
committed
docs(paged): scope durable grouped FP4-MMA MoE GEMM port for GB10
mudler
committed
feat(paged): mirror patch 0014 - expert-aware MoE token-tile cap
mudler
committed
feat(paged): mirror MoE token-tile density-aware auto-select (patch 0015)
mudler
committed
docs(paged): Qwen3.6 NVFP4 h2h bench doc - MoE llama.cpp table
mudler
committed
docs(paged): Qwen3.6 NVFP4 apples-to-apples scorecard (llama vs vLLM, dense + MoE)
mudler
committed
docs(paged): dense NVFP4 fair re-run with max_prefill_tokens budget sweep
mudler
committed
docs(paged): MoE 35B-A3B NVFP4 fair re-run with max_prefill_tokens budget
mudler
committed
docs(paged): fair re-run verdict - synthesize NVFP4 llama vs vLLM scorecard
mudler
committed
docs(paged): scope token-granular continuous-batch scheduler for llama-server
mudler
committed
docs(paged): adversarial review of the continuous-batch scheduler scope
mudler
committed

Commits on Jun 24, 2026

Commits on Jun 25, 2026

Commits on Jun 26, 2026

Commits on Jun 27, 2026

Commits on Jun 28, 2026

Commits on Jun 29, 2026

Commits on Jun 30, 2026

Commits on Jul 1, 2026

Commits on Jul 2, 2026