Skip to content

Conversation

@shreed27
Copy link

@shreed27 shreed27 commented Feb 1, 2026

Description

This PR addresses multiple TODOs related to tokenizer performance and usability. It introduces a global LRU cache for the SentencePiece model to prevent redundant loading and parsing when instantiating multiple Tokenizer objects. Additionally, it upgrades the file caching utility to transparently download or copy files from remote locations (like GCS) if they are not present in the local cache.

Changes

  • gemma/gm/text/_tokenizer.py:

    • Refactored _sp property to use a standalone, globally cached function _load_sp_model.
    • Decorated _load_sp_model with @functools.lru_cache to ensure the underlying C++ model is loaded only once per unique path + custom tokens combination.
    • Extracted custom token application logic into _update_proto_with_custom_tokens for better modularity.
  • gemma/gm/utils/_file_cache.py:

    • Enhanced maybe_get_from_cache to handle cache misses by attempting to copy the file from the remote_file_path.
    • Added directory creation logic to ensure the cache path exists before writing.
  • gemma/gm/utils/_file_cache_test.py:

    • Added test_cache_miss_downloads_file to verify that a missing cache file triggers a copy operation from the source.

Impact

  • Performance: significantly reduces initialization time and memory usage when working with multiple tokenizer instances (e.g., in distributed training, evaluation pipelines, or tests).
  • Usability: seamless handling of remote model paths without manual pre-downloading steps.

Verification

  • Unit Tests: Added coverage in _file_cache_test.py ensuring the download/copy logic works as expected.
  • Existing Tests: Verified _tokenizer_test.py Logic remains consistent (integration tests depend on env setup but logic is unit-verified).

Checklist

  • Implemented global cache for Tokenizer.
  • Implemented auto-download for file cache.
  • Added/Updated tests.
  • Linted code.

- Implemented  for  model loading to prevent redundant IO and parsing when creating multiple instances.
- Added auto-download capability to : now automatically downloads/copies remote files (e.g., gs://) to the local cache if missing.
- Refactored  to separate model loading logic into standalone cached functions.
- Updated  to verify download behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant