-
Notifications
You must be signed in to change notification settings - Fork 1k
fix: added type hints in .py files #1932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds return type hints to Python functions across test files, scripts, and examples. While the PR aims to address missing type hints (issue #1927), it only partially completes the type annotation effort by adding return types without including parameter type hints.
Changes:
- Added return type hints to fixture functions and utility functions in test files
- Added return type hints to converter methods and utility functions in scripts
- Added typing imports where needed for type annotations
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 41 comments.
Show a summary per file
| File | Description |
|---|---|
| bindings/python/tests/utils.py | Added return type hints to download function and pytest fixtures (roberta_files, bert_files, openai_files, train_files, albert_base, doc_wiki_tokenizer, doc_pipeline_bert_tokenizer) |
| bindings/python/tests/bindings/test_trainers.py | Added return type hint to split method in GoodCustomPretok test class |
| bindings/python/tests/bindings/test_tokenizer.py | Added return type hint to format helper function |
| bindings/python/tests/bindings/test_processors.py | Added return type hints to get_bert, get_roberta, and get_t5_squad helper methods |
| bindings/python/tests/bindings/test_pre_tokenizers.py | Added return type hints to split and pre_tokenize methods in custom pre-tokenizer test classes |
| bindings/python/scripts/spm_parity_check.py | Added return type hints to check_diff, check_details, and check_encode functions |
| bindings/python/scripts/convert.py | Added return type hints to vocab, normalizer, post_processor, unk_id methods across multiple converter classes and check function |
| bindings/python/examples/example.py | Added typing import and return type hints to tokenize_r and tokenize_p functions |
| bindings/python/benches/test_tiktoken.py | Added return type hint to test function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| @pytest.fixture(scope="session") | ||
| def bert_files(data_dir): | ||
| def bert_files(data_dir) -> dict[str, str]: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hint for data_dir is missing. Consider adding the appropriate type hint for the parameter to make the function signature complete.
| ] | ||
|
|
||
| def normalizer(self, proto): | ||
| def normalizer(self, proto) -> Sequence: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hint for proto is missing. Consider adding an appropriate type hint for the parameter to make the function signature complete.
| return Sequence(normalizers) | ||
|
|
||
| def post_processor(self, tokenizer): | ||
| def post_processor(self, tokenizer) -> TemplateProcessing: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hint for tokenizer is missing. Consider adding an appropriate type hint for the parameter to make the function signature complete.
| def post_processor(self, tokenizer) -> TemplateProcessing: | |
| def post_processor(self, tokenizer: Tokenizer) -> TemplateProcessing: |
| return 3 | ||
|
|
||
| def post_processor(self, tokenizer): | ||
| def post_processor(self, tokenizer) -> TemplateProcessing: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hint for tokenizer is missing. Consider adding an appropriate type hint for the parameter to make the function signature complete.
|
|
||
| class XLMRobertaConverter(SpmConverter): | ||
| def vocab(self, proto): | ||
| def vocab(self, proto) -> list[tuple[str, float]]: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hint for proto is missing. Consider adding an appropriate type hint for the parameter to make the function signature complete.
| offset = 103 | ||
|
|
||
| def vocab(self, proto): | ||
| def vocab(self, proto) -> list[tuple[str, float]]: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hint for proto is missing. Consider adding an appropriate type hint for the parameter to make the function signature complete.
|
|
||
|
|
||
| def download(url, with_filename=None): | ||
| def download(url, with_filename=None) -> str: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hints are missing. Consider adding type hints for url (str) and with_filename (Optional[str] or str | None) to make the function signature complete.
| return "rest" | ||
|
|
||
| def split(self, n, normalized): | ||
| def split(self, n, normalized) -> list: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hints for n and normalized are missing. Consider adding appropriate type hints for the parameters to make the function signature complete.
|
|
||
|
|
||
| def check_details(line, spm_ids, tok_ids, sp, tok): | ||
| def check_details(line, spm_ids, tok_ids, sp, tok) -> bool: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hints for line, spm_ids, tok_ids, sp, and tok are missing. Consider adding appropriate type hints for the parameters to make the function signature complete.
|
|
||
|
|
||
| def check(pretrained, filename): | ||
| def check(pretrained, filename) -> tuple[str, datetime.timedelta]: |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint has been added, but the parameter type hints for pretrained and filename are missing. Consider adding appropriate type hints for the parameters to make the function signature complete.
added all the missing type hints for the funciton in python files solution for issue no #1927