-
Notifications
You must be signed in to change notification settings - Fork 63
Add Support for Guided Decoding to On Device Sampling #624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
quic-hemagnih
merged 46 commits into
quic:main
from
quic-xiyushi:guided_decoding_simple
Dec 18, 2025
Merged
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
409da24
Extend on-device sampling support for dual QPC VLMs
quic-xiyushi e06e175
Fix random_numbers shape
quic-xiyushi 3e242ce
Update example with new random sampling logic
quic-xiyushi 1a01d57
Update to align with recent VLM CB changes
quic-xiyushi 30d6061
Update tests with new random sampling logic
78ef180
Add code to perform guided decoding
1fafcdb
Add bitmask to example inputs and dynamic axes
18ab856
Rename bitmask to token_bitmasks
b1c049c
Fix typo
e16e846
Merge branch 'main' into guided_decoding_simple
1515497
Add flag to enable guided decoding
d02d04d
Merge remote-tracking branch 'origin/main' into HEAD
quic-xiyushi 97e4baf
Add flag to enable guided decoding
7b7677b
Update test_sampler_transform for guided decoding
7cf106e
Refactor
quic-xiyushi 45aed11
Add unit tests
quic-xiyushi 6273ab5
Clean up
quic-xiyushi ef9ae14
Merge remote-tracking branch 'origin/main' into HEAD
quic-xiyushi 60312b3
Add test for guided decoding
3789d5a
Update test_sampler.py
quic-xiyushi 251099f
Merge branch 'on-device-sampling-vlm' into guided_decoding_simple
a24a55d
Enable guided decoding in vlm generation
55e76e9
Fix bug
f9355d4
Fix bug
5e2afb7
Fix hash for VLM's language decoder to include qaic_config
quic-xiyushi e672701
Merge branch 'on-device-sampling-vlm' into guided_decoding_simple
eee5314
Enable guided decoding test for vlms
60cf5ec
Use different config for each vlm
a71ee65
Update type
df06617
Merge remote-tracking branch 'origin/main' into HEAD
quic-xiyushi 10990a9
Fix bug in getting vocab_size and missing ccl in forward
quic-xiyushi b47b633
Merge branch 'on-device-sampling-vlm' into guided_decoding_simple
b5a7b99
Merge branch 'main' into guided_decoding_simple
quic-mamta 3fcd9eb
Merge branch 'main' into guided_decoding_simple
quic-mamta 98cfadf
Merge branch 'main' into on-device-sampling-vlm
quic-mamta a60e7ce
Merge branch 'main' into on-device-sampling-vlm
quic-xiyushi b22af54
Support prefix-caching with on-device sampling
quic-xiyushi 2533262
Modify tests to use internvl 1b for quicker CI
quic-xiyushi 5457075
Merge remote-tracking branch 'origin/on-device-sampling-vlm' into HEAD
quic-xiyushi 8698651
Merge branch 'main' into on-device-sampling-vlm
quic-xiyushi 86aaad2
Fix compilation error on Llama3.1 8B due to changes in presence penalty
quic-xiyushi a2d4fb4
Update tests
quic-xiyushi eaf21c0
Merge remote-tracking branch 'origin/on-device-sampling-vlm' into HEAD
quic-xiyushi feeaa37
Extend on-device sampling support to llava, garnite, gemma, and llama4
quic-xiyushi 5f716ef
Merge remote-tracking branch 'origin/main' into HEAD
quic-xiyushi 96e13a8
Merge branch 'main' into guided_decoding_simple
quic-hemagnih File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.