Fix XGrammar bitmask initialization and add null check for gen_config in generate method by windreamer · Pull Request #4349 · InternLM/lmdeploy

windreamer · 2026-02-11T03:27:23Z

Motivation

This PR addresses two critical issues in LMDeploy's guided generation and batch inference functionality:

XGrammar Bitmask Initialization Bug: The guided decoding mechanism initializes token bitmasks to control which tokens are allowed during constrained generation. Previously, the bitmask was initialized with zeros, which incorrectly blocked all tokens instead of allowing all tokens. This caused generation failures when certain sequences in a batch didn't have grammar constraints applied.
Silent Failure on None gen_config: The batch_infer method accepts a list of GenerationConfig objects, but when None was passed for individual items, the code would silently fail or hang due to unhandled null pointer dereference when accessing gen_config.max_new_tokens.

Modifications

Guided Decoding Bitmask Fix (src/turbomind/generation/guided_decoding.cc):
- Changed the bitmask initialization value from 0 to -1 (all bits set to 1 in two's complement representation for int32_t).
- This ensures that when a sequence doesn't have an active grammar matcher, all tokens are permitted by default rather than none.
Null GenerationConfig Guard (lmdeploy/serve/core/async_engine.py):
- Added an explicit check to instantiate a default GenerationConfig() when gen_config is None.
- This prevents downstream code from attempting to access attributes on a null object, which was causing the engine to hang silently.
Regression Test (tests/test_lmdeploy/test_grammar.py):
- Added test_mix_guided_matrix to verify that batch inference works correctly when mixing guided and unguided generation in the same batch.
- The test ensures unguided sequences produce arbitrary text (not necessarily conforming to schema) while guided sequences strictly follow the JSON schema constraints.

Copilot

Pull request overview

This pull request fixes a bug where grammar constraints from structured output requests incorrectly persist and apply to subsequent non-structured output requests when ModelRequest instances are reused. The fix introduces a clearGrammar() method to explicitly reset the grammar state after each inference session completes.

Changes:

Added clearGrammar() C++ method to ModelRequest class for resetting grammar state
Added Python binding clear_grammar() to expose the cleanup functionality
Called clear_grammar() in the finally block of async_stream_infer() to ensure cleanup

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
src/turbomind/engine/model_request.h	Declares the new clearGrammar() method
src/turbomind/engine/model_request.cc	Implements clearGrammar() to reset the grammar_ shared_ptr
src/turbomind/python/bind.cpp	Adds Python binding for clear_grammar with appropriate GIL handling
lmdeploy/turbomind/turbomind.py	Calls clear_grammar() in finally block after inference completes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/turbomind/turbomind.py

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_lmdeploy/test_grammar.py

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/turbomind/turbomind.py

src/turbomind/generation/guided_decoding.cc

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/turbomind/python/bind.cpp

src/turbomind/engine/model_request.cc

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_lmdeploy/test_grammar.py

lmdeploy/serve/core/async_engine.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/serve/core/async_engine.py

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_lmdeploy/test_grammar.py

lmdeploy/serve/core/async_engine.py

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_lmdeploy/test_grammar.py

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

windreamer requested a review from lzhangzz February 11, 2026 03:27

windreamer mentioned this pull request Feb 11, 2026

[Bug] Segmentation fault (core dumped) with tp >= 2 #4337

Open

3 tasks

windreamer force-pushed the fix_guided_decoding_reuse branch from 20ef3f9 to 547c994 Compare February 11, 2026 03:30

CUHKSZzxy requested a review from Copilot February 11, 2026 06:39

Copilot started reviewing on behalf of CUHKSZzxy February 11, 2026 06:40 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

lmdeploy/turbomind/turbomind.py Outdated Show resolved Hide resolved

lmdeploy/turbomind/turbomind.py Outdated Show resolved Hide resolved

windreamer requested a review from Copilot February 12, 2026 02:45

Copilot started reviewing on behalf of windreamer February 12, 2026 02:46 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

tests/test_lmdeploy/test_grammar.py Show resolved Hide resolved

windreamer force-pushed the fix_guided_decoding_reuse branch 2 times, most recently from f606bb6 to f81c224 Compare February 12, 2026 13:33

windreamer requested a review from Copilot February 12, 2026 13:34

Copilot started reviewing on behalf of windreamer February 12, 2026 13:34 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

lmdeploy/turbomind/turbomind.py Outdated Show resolved Hide resolved

src/turbomind/generation/guided_decoding.cc Outdated Show resolved Hide resolved

windreamer force-pushed the fix_guided_decoding_reuse branch from f81c224 to de1e35e Compare February 12, 2026 14:12

windreamer requested a review from Copilot February 12, 2026 14:13

Copilot started reviewing on behalf of windreamer February 12, 2026 14:13 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

src/turbomind/python/bind.cpp Outdated Show resolved Hide resolved

src/turbomind/engine/model_request.cc Outdated Show resolved Hide resolved

windreamer force-pushed the fix_guided_decoding_reuse branch from de1e35e to f74ca9b Compare February 13, 2026 09:41

windreamer changed the title ~~fix: add clear_grammar to remove grammar from reused model_request~~ Fix XGrammar bitmask initialization and add null check for gen_config in generate method Feb 13, 2026

windreamer requested review from Copilot and lvhan028 February 13, 2026 09:49

Copilot AI reviewed Feb 13, 2026

View reviewed changes

tests/test_lmdeploy/test_grammar.py Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of windreamer February 13, 2026 09:55 View session

windreamer mentioned this pull request Feb 24, 2026

[Bug] GPT-OSS-120B + openai-python empty result from client.beta.chat.completions.parse with response_format #4347

Open

3 tasks

lvhan028 reviewed Feb 24, 2026

View reviewed changes

lmdeploy/serve/core/async_engine.py Show resolved Hide resolved

windreamer force-pushed the fix_guided_decoding_reuse branch 3 times, most recently from 2e518d2 to e8c7c32 Compare February 24, 2026 09:43

windreamer requested a review from Copilot February 24, 2026 09:43

windreamer requested a review from lvhan028 February 24, 2026 09:43

Copilot started reviewing on behalf of windreamer February 24, 2026 09:44 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

lmdeploy/serve/core/async_engine.py Outdated Show resolved Hide resolved

windreamer force-pushed the fix_guided_decoding_reuse branch from e8c7c32 to 52fbdc5 Compare February 24, 2026 10:01

windreamer requested a review from Copilot February 24, 2026 10:02

Copilot started reviewing on behalf of windreamer February 24, 2026 10:03 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

tests/test_lmdeploy/test_grammar.py Show resolved Hide resolved

lmdeploy/serve/core/async_engine.py Outdated Show resolved Hide resolved

windreamer force-pushed the fix_guided_decoding_reuse branch 2 times, most recently from 21e8300 to 5bc9970 Compare February 24, 2026 10:13

windreamer requested a review from Copilot February 24, 2026 10:14

Copilot started reviewing on behalf of windreamer February 24, 2026 10:14 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

tests/test_lmdeploy/test_grammar.py Show resolved Hide resolved

tests/test_lmdeploy/test_grammar.py Show resolved Hide resolved

windreamer added 2 commits February 24, 2026 18:25

fix(guided-decoding): fix the initialized value from 0s to 1s

c55d7f2

ci: fix obsolated docker client

7e596db

windreamer force-pushed the fix_guided_decoding_reuse branch from 5bc9970 to 762bcb2 Compare February 24, 2026 10:27

windreamer requested a review from Copilot February 24, 2026 10:35

Copilot started reviewing on behalf of windreamer February 24, 2026 10:36 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

windreamer force-pushed the fix_guided_decoding_reuse branch from 762bcb2 to 03aadff Compare February 24, 2026 11:15

windreamer and others added 2 commits February 24, 2026 20:16

fix: fix when gen_config is a mix of None and config

b1f195c

test: add mixing guided and non-guided tests

73b4103

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

windreamer force-pushed the fix_guided_decoding_reuse branch 2 times, most recently from 973a080 to 73b4103 Compare February 25, 2026 03:26

lvhan028 approved these changes Feb 26, 2026

View reviewed changes

lzhangzz approved these changes Feb 26, 2026

View reviewed changes

lvhan028 added the Bug:P1 label Feb 26, 2026

lvhan028 merged commit 3cd80ae into InternLM:main Feb 26, 2026
9 checks passed

Conversation

windreamer commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

windreamer commented Feb 11, 2026 •

edited

Loading