Skip to content

Fix XGrammar bitmask initialization and add null check for gen_config in generate method#4349

Merged
lvhan028 merged 4 commits intoInternLM:mainfrom
windreamer:fix_guided_decoding_reuse
Feb 26, 2026
Merged

Fix XGrammar bitmask initialization and add null check for gen_config in generate method#4349
lvhan028 merged 4 commits intoInternLM:mainfrom
windreamer:fix_guided_decoding_reuse

Conversation

@windreamer
Copy link
Collaborator

@windreamer windreamer commented Feb 11, 2026

Motivation

This PR addresses two critical issues in LMDeploy's guided generation and batch inference functionality:

  1. XGrammar Bitmask Initialization Bug: The guided decoding mechanism initializes token bitmasks to control which tokens are allowed during constrained generation. Previously, the bitmask was initialized with zeros, which incorrectly blocked all tokens instead of allowing all tokens. This caused generation failures when certain sequences in a batch didn't have grammar constraints applied.

  2. Silent Failure on None gen_config: The batch_infer method accepts a list of GenerationConfig objects, but when None was passed for individual items, the code would silently fail or hang due to unhandled null pointer dereference when accessing gen_config.max_new_tokens.

Modifications

  1. Guided Decoding Bitmask Fix (src/turbomind/generation/guided_decoding.cc):

    • Changed the bitmask initialization value from 0 to -1 (all bits set to 1 in two's complement representation for int32_t).
    • This ensures that when a sequence doesn't have an active grammar matcher, all tokens are permitted by default rather than none.
  2. Null GenerationConfig Guard (lmdeploy/serve/core/async_engine.py):

    • Added an explicit check to instantiate a default GenerationConfig() when gen_config is None.
    • This prevents downstream code from attempting to access attributes on a null object, which was causing the engine to hang silently.
  3. Regression Test (tests/test_lmdeploy/test_grammar.py):

    • Added test_mix_guided_matrix to verify that batch inference works correctly when mixing guided and unguided generation in the same batch.
    • The test ensures unguided sequences produce arbitrary text (not necessarily conforming to schema) while guided sequences strictly follow the JSON schema constraints.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a bug where grammar constraints from structured output requests incorrectly persist and apply to subsequent non-structured output requests when ModelRequest instances are reused. The fix introduces a clearGrammar() method to explicitly reset the grammar state after each inference session completes.

Changes:

  • Added clearGrammar() C++ method to ModelRequest class for resetting grammar state
  • Added Python binding clear_grammar() to expose the cleanup functionality
  • Called clear_grammar() in the finally block of async_stream_infer() to ensure cleanup

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/turbomind/engine/model_request.h Declares the new clearGrammar() method
src/turbomind/engine/model_request.cc Implements clearGrammar() to reset the grammar_ shared_ptr
src/turbomind/python/bind.cpp Adds Python binding for clear_grammar with appropriate GIL handling
lmdeploy/turbomind/turbomind.py Calls clear_grammar() in finally block after inference completes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@windreamer windreamer force-pushed the fix_guided_decoding_reuse branch 2 times, most recently from f606bb6 to f81c224 Compare February 12, 2026 13:33
@windreamer windreamer requested a review from Copilot February 12, 2026 13:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@windreamer windreamer force-pushed the fix_guided_decoding_reuse branch from de1e35e to f74ca9b Compare February 13, 2026 09:41
@windreamer windreamer changed the title fix: add clear_grammar to remove grammar from reused model_request Fix XGrammar bitmask initialization and add null check for gen_config in generate method Feb 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@windreamer windreamer force-pushed the fix_guided_decoding_reuse branch 3 times, most recently from 2e518d2 to e8c7c32 Compare February 24, 2026 09:43
@windreamer windreamer requested a review from Copilot February 24, 2026 09:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@windreamer windreamer force-pushed the fix_guided_decoding_reuse branch 2 times, most recently from 21e8300 to 5bc9970 Compare February 24, 2026 10:13
@windreamer windreamer requested a review from Copilot February 24, 2026 10:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@windreamer windreamer force-pushed the fix_guided_decoding_reuse branch from 762bcb2 to 03aadff Compare February 24, 2026 11:15
windreamer and others added 2 commits February 24, 2026 20:16
@windreamer windreamer force-pushed the fix_guided_decoding_reuse branch 2 times, most recently from 973a080 to 73b4103 Compare February 25, 2026 03:26
@lvhan028 lvhan028 merged commit 3cd80ae into InternLM:main Feb 26, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants