Fix: quantizations issue for multimodel #519
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #516
Problem
When using
IntWrapperwith multimodal models (e.g., Gemma3_4B), quantization fails with:Root cause:
IntWrapperwas applying quantization to ALL modules including the vision encoder, butpeft.quantize()only quantizes text model parameters. This mismatch caused vision encoder modules to expect quantized parameters that didn't exist.Solution
Added scope-based exclusion mechanism to
IntWrapper:exclude_scopesparameter (default:('vision_encoder',))Changes
gemma/gm/nn/_quantization.pyIntWrapper class:
exclude_scopes: tuple[str, ...] = ('vision_encoder',)parameter__call__to pass exclusions to_replace_by_int_replace_by_int function:
exclude_scopesparametergemma/gm/nn/_quantization_multimodal_test.pyAdded comprehensive unit tests:
Usage
this code is fully backward compatible with existing code works without changes