-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Add Youtu-LLM model #43166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Youtu-LLM model #43166
Conversation
May I ask if it is possible to concentrate the test only on Youtu-LLM (the new model)? The summary here seems report errors raised by other models. junru |
…ition_embedding in DiT (huggingface#43068) * qwen2_5_omni: make max_mel_frames an inference-time knob * not fail with raising ValueError, instead make it continue to run by choosing a target_duration that's capped and aligned * added unit tests for Token2WavShape shape mismatch Signed-off-by: Dong Wang <dongw2019@gmail.com> * make fixup * remove unit test which takes too much GPU memory Signed-off-by: Dong Wang <dongw2019@gmail.com> * reduce gpu memory usage from the unit test * addressed comments Signed-off-by: Dong Wang <dongw2019@gmail.com> --------- Signed-off-by: Dong Wang <dongw2019@gmail.com>
|
run-slow: youtu_llm |
|
This comment contains models: ["models/youtu_llm"] |
CI Results✅ No failing test specific to this PR 🎉 ! |
molbap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems clean, good modular file with simply Llama + MLA, beautiful. Asked a few questions, let me know and I'll re-review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the official name YoutuLLM or Youtu as in the prefixes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chose to use Youtu as the prefix of modules, as it is more suitable for extension (e.g., we plan to introduce YoutuVL in near future). Youtu-LLM is rather a brand name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then everything that has to see with model name (youtu) should be named as such, like the model directory
|
|
||
| model_sdpa = YoutuForCausalLM.from_pretrained( | ||
| "tencent/Youtu-LLM-2B-Base", | ||
| dtype=torch.float16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make sdpa explicit here
|
|
||
|
|
||
| class YoutuModel(LlamaModel): | ||
| _keys_to_ignore_on_load_unexpected = [""] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this to remove the Llama attribute? if so, ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the current version of the model (Youtu-LLM-2B family), this line of code could be removed.
| @require_torch_accelerator | ||
| @pytest.mark.torch_compile_test | ||
| @require_read_token | ||
| def test_compile_static_cache(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding an integration test! however naming-wise, seems to measure dynamic and static Cache no? By the way, could we have a simple no-compile integration test that works in the simplest setting, just to avoid regressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have provided inference tests below based on no-compile dynamic cache and no-compile static cache. Basically, I implemented this test function by referencing test function of DeepSeek V3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, can we update the name though to make it more clear and separate in two tests? that way if it breaks at some point it's easier to debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, is there any official examples that I can follow up?
| @parameterized.expand([("random",), ("same",)]) | ||
| @unittest.skip("Youtu-LLM is not compatible with assisted decoding") | ||
| def test_assisted_decoding_matches_greedy_search(self, assistant_type): | ||
| pass | ||
|
|
||
| @unittest.skip("Youtu-LLM is not compatible with assisted decoding") | ||
| def test_prompt_lookup_decoding_matches_greedy_search(self, assistant_type): | ||
| pass | ||
|
|
||
| @unittest.skip("Youtu-LLM is not compatible with assisted decoding") | ||
| def test_assisted_decoding_sample(self): | ||
| pass | ||
|
|
||
| @unittest.skip("Youtu-LLM uses MLA so it is not compatible with the standard cache format") | ||
| def test_beam_search_generate_dict_outputs_use_cache(self): | ||
| pass | ||
|
|
||
| @unittest.skip("Youtu-LLM uses MLA so it is not compatible with the standard cache format") | ||
| def test_greedy_generate_dict_outputs_use_cache(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="SDPA can't dispatch on flash due to unsupported head dims") | ||
| def test_sdpa_can_dispatch_on_flash(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="Youtu-LLM is not suitable for testing with extreme small vocabulary") | ||
| def test_resize_tokens_embeddings(self): | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are all these tests indeed not working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check if we can fix the majority by moving the tests under the CausalLM wrapper classes
|
Hi @molbap I've updated a new version of code according to the discussion aforementioned. Can you help start a new solo test of Youtu-LLM (run-slow: youtu_llm)? |
vasqu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing some quick fixes so some comments are outdated but it will still need another look into the tests especially in regards to the causal lm tester (I have done something quickly but shapes are not working as expected for some tests so either needs to fix the init properly or overwrite some tests that check shapes)
I hope that lifts some confusions
| Youtu_PRETRAINED_CONFIG_ARCHIVE_MAP = {} | ||
|
|
||
|
|
||
| class YoutuConfig(PreTrainedConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could remove unrelated attributes after the super with self.attr
| initializer_range: float | None = None, | ||
| embedding_initializer_range: float | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could at least avoid the else branch, but yes fair I overlooked that embedding was also dependent there
| # for backward compatibility | ||
| if num_key_value_heads is None: | ||
| num_key_value_heads = num_attention_heads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt be needed, we should have this directly as arg if we already have the power to change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the YoutuConfig is inherited from DeepseekV3Config, so most codes removed.
| class YoutuDecoderLayer(LlamaDecoderLayer): | ||
| def __init__(self, config: YoutuConfig, layer_idx: int): | ||
| nn.Module.__init__(self) | ||
| self.hidden_size = config.hidden_size | ||
|
|
||
| self.self_attn = YoutuAttention(config=config, layer_idx=layer_idx) | ||
|
|
||
| self.mlp = YoutuMLP(config) | ||
|
|
||
| self.input_layernorm = YoutuRMSNorm(config.hidden_size, eps=config.rms_norm_eps) | ||
| self.post_attention_layernorm = YoutuRMSNorm(config.hidden_size, eps=config.rms_norm_eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class YoutuDecoderLayer(LlamaDecoderLayer): | |
| def __init__(self, config: YoutuConfig, layer_idx: int): | |
| nn.Module.__init__(self) | |
| self.hidden_size = config.hidden_size | |
| self.self_attn = YoutuAttention(config=config, layer_idx=layer_idx) | |
| self.mlp = YoutuMLP(config) | |
| self.input_layernorm = YoutuRMSNorm(config.hidden_size, eps=config.rms_norm_eps) | |
| self.post_attention_layernorm = YoutuRMSNorm(config.hidden_size, eps=config.rms_norm_eps) | |
| class YoutuDecoderLayer(LlamaDecoderLayer): | |
| pass |
Should now be the same as Llama, no? The init should overlap now too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
| if isinstance(module, nn.Linear): | ||
| init.normal_(module.weight, mean=0.0, std=std) | ||
| if module.bias is not None: | ||
| init.zeros_(module.bias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you use initializer_range to init linear layers so we handle this already, see
transformers/src/transformers/modeling_utils.py
Lines 2286 to 2287 in 0189e04
| if hasattr(self.config, "initializer_range"): | |
| std = self.config.initializer_range or 0.02 |
transformers/src/transformers/modeling_utils.py
Lines 2296 to 2300 in 0189e04
| if isinstance(module, (nn.Linear, nn.Conv1d, nn.Conv2d, nn.Conv3d, nn.ConvTranspose1d, nn.ConvTranspose2d)): | |
| if getattr(module, "weight", None) is not None: | |
| init.normal_(module.weight, mean=0.0, std=std) | |
| if module.bias is not None: | |
| init.zeros_(module.bias) |
Hence, you can leave that part out and only update the embedding part
| "parakeet": "ParakeetCTCConfig", | ||
| "lasr": "LasrCTCConfig", | ||
| "wav2vec2-with-lm": "Wav2Vec2Config", | ||
| "youtu": "YoutuConfig", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed, would like to leave this out if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
utils/check_docstrings.py
Outdated
| "YolosImageProcessor", | ||
| "YolosModel", | ||
| "YosoConfig", | ||
| "YoutuConfig", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run make fix-repo (without having this added to any exceptions) --> you will see it will update a few strings like I showed above, namely initializer_range and embedding_initializer_range - you might need to left tab them to have the same spacing again as before.
| if model.config.tie_word_embeddings: | ||
| # Youtu-LLM-2B-Base contains extra repeated weights for the tied embeddings, we can tie weights here according to its config | ||
| model.tie_weights() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes gotcha
c8251c7 to
c3e02fe
Compare
|
Hi @vasqu I have fixed most of the issues mentioned above, please check again. There is one specific issue related to to I noticed many |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed a few things in regards to the config and tests, that resolves your issue as well it seems. Lmk if not!
Just a few last nits but approving since it's nothing major, quick checking with our slow CI in a second (might need to adjust values because of GPU differences)
| logger = logging.get_logger(__name__) | ||
|
|
||
|
|
||
| class YoutuConfig(DeepseekV3Config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that was confusing on my side --> I meant to say to add this to modular then. You can see that it will unfold the inherited attributes in the config file (which also solves the consistency issues) but better double check I haven't missed something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, got it
| def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation: set | None = None, **kwargs): | ||
| raise AttributeError("Not overwritten for the Youtu model!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fyi, this way you can disable inheriting function from others
| class YoutuModelTester(CausalLMModelTester): | ||
| if is_torch_available(): | ||
| base_model_class = YoutuModel | ||
|
|
||
| def __init__( | ||
| self, | ||
| parent, | ||
| kv_lora_rank=16, | ||
| q_lora_rank=32, | ||
| qk_rope_head_dim=32, | ||
| qk_nope_head_dim=32, | ||
| v_head_dim=32, | ||
| ): | ||
| super().__init__(parent=parent) | ||
| self.kv_lora_rank = kv_lora_rank | ||
| self.q_lora_rank = q_lora_rank | ||
| self.qk_nope_head_dim = qk_nope_head_dim | ||
| self.qk_rope_head_dim = qk_rope_head_dim | ||
| self.v_head_dim = v_head_dim | ||
|
|
||
|
|
||
| @require_torch | ||
| class YoutuModelTest(CausalLMModelTest, unittest.TestCase): | ||
| model_tester_class = YoutuModelTester | ||
|
|
||
| def _check_past_key_values_for_generate(self, batch_size, past_key_values, seq_length, config): | ||
| """Needs to be overridden as youtu-llm has special MLA cache format (though we don't really use the MLA)""" | ||
| self.assertIsInstance(past_key_values, Cache) | ||
|
|
||
| # (batch, head, seq_length, head_features) | ||
| expected_common_shape = ( | ||
| batch_size, | ||
| getattr(config, "num_key_value_heads", config.num_attention_heads), | ||
| seq_length, | ||
| ) | ||
| expected_key_shape = expected_common_shape + (config.qk_nope_head_dim + config.qk_rope_head_dim,) | ||
| expected_value_shape = expected_common_shape + (config.v_head_dim,) | ||
|
|
||
| for layer in past_key_values.layers: | ||
| self.assertEqual(layer.keys.shape, expected_key_shape) | ||
| self.assertEqual(layer.values.shape, expected_value_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this to use our causal lm class - it makes it easier for use to refactor tests in the future
| def tearDown(self): | ||
| cleanup(torch_device, gc_collect=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also clean on setup e.g.
transformers/tests/models/llama/test_modeling_llama.py
Lines 65 to 73 in a1f63d5
| def setup(self): | |
| cleanup(torch_device, gc_collect=True) | |
| def tearDown(self): | |
| # TODO (joao): automatic compilation, i.e. compilation when `cache_implementation="static"` is used, leaves | |
| # some memory allocated in the cache, which means some object is not being released properly. This causes some | |
| # unoptimal memory usage, e.g. after certain tests a 7B model in FP16 no longer fits in a 24GB GPU. | |
| # Investigate the root cause. | |
| cleanup(torch_device, gc_collect=True) |
(no need to copy the comment)
|
It does seem we need to fix at least: |
|
run-slow: youtu |
|
This comment contains models: ["models/youtu"] |
CI ResultsModel CI Report❌ Failed tests
|
alright, i guess this was the error that led me to add conditions of |
|
|
|
run-slow: youtu |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, youtu |
|
This comment contains models: ["models/youtu"] |
CI Results✅ No failing test specific to this PR 🎉 ! |
|
@LuJunru I just updated some last few nits, everything passes (the other tests are unrelated) so I merged Thanks for iterating and gz on the model addition 🤗 |


What does this PR do?
This PR adds the implementation for the released Youtu-LLM model. The model has the following features:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker @Cyrilvallez