Sync master with upstream release b9326 by jan-service-account · Pull Request #531 · janhq/llama.cpp

jan-service-account · 2026-05-26T01:09:43Z

Updates dev branch with latest release (b9326) from ggml-org/llama.cpp

* ci : remove tag from build-self-hosted.yml * ci : slim -> self-hosted * ci : prevent heavy CPU jobs from running on fast runners * ci : prevent cmake pkg to run on dedicated fast runners * ci : try to bump 3.11 -> 3.13 * ci : move lint back to 3.11 * ci : back to 3.11 * ci : add comment about UI jobs * ci : move python requirements check to CPU runners this job is a bit slow for a dedicated "fast" runner * ci : add self-hosted ui workflow * ci : fix UI naming * tmp to check if arm64 fast is compatible with all jobs * revert last commit

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* common : add common_chat_split_by_role * cont : fix spans to reach end of message * server: fix checkpoints creation - extract message_spans from chat templates - find the prompt token position before the latest user message - split prompt batching at that position - create a context checkpoint before the latest user input - avoid periodic mid-prompt checkpoints when that position is known - handle multimodal prompts when mapping text/template positions to server prompt tokens - add --checkpoint-min-step to control minimum spacing between checkpoints * cont : clean-up * Support autoparser detection for message barriers * server: fix message span delimiter and update docs --------- Co-authored-by: Alde Rojas <hello@alde.dev> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>

* ui: media attachments before text * fix prettier formatting

- Use OpenMP to parallelize iq2xs_init_impl and iq3xs_init_impl. - Move the OpenMP detection from ggml-cpu to ggml-base. - Update OpenMP dependencies in ggml-config.cmake.in.

…nce (ggml-org#23520)

…#23642) * fix(action): update SpacemiT toolchain URL and version Change-Id: If4cc1c738a855274103f8c3ad52daa33528acd0c * fix(action): add -L flag to curl command for URL redirection Change-Id: I9b6c37390f0c7a733a36308c8fb53d22d234ab06

)

…#22341) * ggml: implement `gguf_init_from_buffer` * test: `gguf_init_from_buffer` * fix: memory breakdown for a model loaded with `no_alloc` from a file is consistent with being loaded from a buffer * fix: use `GGML_UNUSED` Co-authored-by: Copilot <copilot@github.com> * fix: remove `total_size` from `gguf_reader` * fix: file offset calculation, rename `offset` to `data_offset` Co-authored-by: Copilot <copilot@github.com> * refactor: extract model loader bug fixes to another PR * feat: add `gguf_init_from_callback` * fix: always require a max expected size * fix: change `gguf_reader_callback_t`'s `output` type to `void *`, change `max_expected_size` and offsets to `uint64_t` * fix: harden against offset overflow in buffer read * fix: remove seek behavior from the callback * feat: `max_chunk_read == 0` means `SIZE_MAX` * fix: seeking in a gguf file with no tensors --------- Co-authored-by: Copilot <copilot@github.com>

* TP: fix ggml context size calculation, memory leak * move split state cache back into the context * revert to constant ggml context size for cgraphs * increase headroom for statically allocated tensors * remove obsolete include

…ggml/1492)

CISC and others added 21 commits May 24, 2026 09:51

convert : minor fixes for numpy 2.x (ggml-org#23571)

5d246a7

ci : update build-self-hosted.yml (ggml-org#23616)

549b9d8

perplexity : fix even more integer overflows (ggml-org#23623)

6d57c26

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

vendor : update cpp-httplib to 0.45.1 (ggml-org#23639)

9627d0f

ui: media attachments before text (ggml-org#23467)

b964876

* ui: media attachments before text * fix prettier formatting

ggml : Parallelize quant LUT init (ggml-org#23595)

826539c

- Use OpenMP to parallelize iq2xs_init_impl and iq3xs_init_impl. - Move the OpenMP detection from ggml-cpu to ggml-base. - Update OpenMP dependencies in ggml-config.cmake.in.

ci : install host compiler on android-ndk build (ggml-org#23630)

d55fb97

llama : document that only one on-device state can be saved per seque…

314e729

…nce (ggml-org#23520)

ci : fix pre-tokenizer-hashes check (ggml-org#23651)

062d311

server: MTP layer kv-cache should respect draft type ctk (ggml-org#23646

6c4cbdc

)

TP: fix ggml context size calculation (ggml-org#22616)

ae251b5

* TP: fix ggml context size calculation, memory leak * move split state cache back into the context * revert to constant ggml context size for cgraphs * increase headroom for statically allocated tensors * remove obsolete include

ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (…

fa97041

…ggml/1492)

ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500)

b251f74

ggml : bump version to 0.12.1 (ggml/1508)

ce5890b

sync : ggml

22307b3

ggml : bump version to 0.13.0 (ggml/1510)

45158f4

sync : ggml

d161ea7

Minh141120 closed this May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync master with upstream release b9326#531

Sync master with upstream release b9326#531
jan-service-account wants to merge 21 commits into
devfrom
update-dev-from-master-2026-05-26-01-09

jan-service-account commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Conversation

jan-service-account commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants