Skip to content

Comments

[Feature] Mamba Hybrid Models#520

Open
clairesonglee wants to merge 25 commits intomainfrom
clairlee/dev/hybrid
Open

[Feature] Mamba Hybrid Models#520
clairesonglee wants to merge 25 commits intomainfrom
clairlee/dev/hybrid

Conversation

@clairesonglee
Copy link
Contributor

No description provided.

@clairesonglee clairesonglee marked this pull request as ready for review February 4, 2026 01:12
clairesonglee and others added 2 commits February 4, 2026 20:28
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
@clairesonglee clairesonglee changed the title Mamba Hybrid Models [Feature] Mamba Hybrid Models Feb 6, 2026
HuangWei-95 and others added 3 commits February 19, 2026 23:00
)

Override Megatron build_tokenizer to support custom tokenizer types with
HuggingFace Hub IDs

- Fixes Llama2Tokenizer failing with Hub IDs in new architecture

- All custom types now work consistently in legacy and new architectures

---------

Co-authored-by: HuangWei-95 <weihuan@amd.com>
Co-authored-by: Xiaoming-AMD <Xiaoming.Peng@amd.com>
- Update MegatronPretrainTrainer.run_train() to detect model_type from backend_args
- Conditionally import pretrain_mamba or pretrain_gpt based on model_type
- Pass model_type to get_model_provider() to use correct builder (mamba_builder vs gpt_builder)
- Restore core runtime support for megatron as intended by commit cfe8cc0
- Fixes 'specialize for HybridStack' error when using core runtime with hybrid models
add env for TestMegatronTrainerDeterministic ci test

Co-authored-by: HuangWei-95 <weihuan@amd.com>
WangLingxun and others added 5 commits February 19, 2026 23:16
Update all references to the Primus Docker base image across
documentation, configuration files, CI/CD workflows, and example scripts
to use the latest v26.1 release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants