Skip to content

Comments

[Primus] Merge Hybrid Models branch to Primus release v26.2 branch#559

Closed
clairesonglee wants to merge 27 commits intorelease/v26.2from
hybrid/release/v26.2
Closed

[Primus] Merge Hybrid Models branch to Primus release v26.2 branch#559
clairesonglee wants to merge 27 commits intorelease/v26.2from
hybrid/release/v26.2

Conversation

@clairesonglee
Copy link
Contributor

No description provided.

clairesonglee and others added 27 commits January 28, 2026 16:33
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…32B Configs for MI300X & MI355X (#556)

YF: Only SFT related config and Doc changes, bypassing unit CI tests

## Summary

This PR introduces post-training documentation and updates Qwen3 32B
model configuration files to support AMD MI300X and MI355X accelerators.

---

## Changes

### 📘 Documentation

- **Added `posttraining.md`**
  - New comprehensive guide for post-training workflows
  - Covers setup instructions, configuration details, and usage examples

- **Updated `docs/README.md`**
  - Added a new section referencing post-training documentation
  - Improved documentation organization and navigation

---

### ⚙️ Configuration Updates

- **Updated Qwen3_32B model YAML configs**
  - Added/modified configurations optimized for:
    - MI300X
    - MI355X
  - Adjusted parameters for compatibility and stable execution

---

## Validation

- Verified updated configs load and execute successfully on MI300X and
MI355X environments
- Confirmed documentation links and structure render correctly

---

## Checklist

- [x] Added `posttraining.md`
- [x] Updated `docs/README.md`
- [x] Modified Qwen3_32B YAML configs
- [x] Verified changes locally
)

Override Megatron build_tokenizer to support custom tokenizer types with
HuggingFace Hub IDs

- Fixes Llama2Tokenizer failing with Hub IDs in new architecture

- All custom types now work consistently in legacy and new architectures

---------

Co-authored-by: HuangWei-95 <weihuan@amd.com>
Co-authored-by: Xiaoming-AMD <Xiaoming.Peng@amd.com>
- Update MegatronPretrainTrainer.run_train() to detect model_type from backend_args
- Conditionally import pretrain_mamba or pretrain_gpt based on model_type
- Pass model_type to get_model_provider() to use correct builder (mamba_builder vs gpt_builder)
- Restore core runtime support for megatron as intended by commit cfe8cc0
- Fixes 'specialize for HybridStack' error when using core runtime with hybrid models
add env for TestMegatronTrainerDeterministic ci test

Co-authored-by: HuangWei-95 <weihuan@amd.com>
Update all references to the Primus Docker base image across
documentation, configuration files, CI/CD workflows, and example scripts
to use the latest v26.1 release.
@clairesonglee clairesonglee marked this pull request as ready for review February 20, 2026 20:34
@kailashg26 kailashg26 deleted the branch release/v26.2 February 21, 2026 01:00
@kailashg26 kailashg26 closed this Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants