Skip to content

Dfridman/deepseek v3#445

Open
denys-fridman wants to merge 17 commits intomlcommons:masterfrom
denys-fridman:dfridman/deepseek-v3
Open

Dfridman/deepseek v3#445
denys-fridman wants to merge 17 commits intomlcommons:masterfrom
denys-fridman:dfridman/deepseek-v3

Conversation

@denys-fridman
Copy link

No description provided.

@denys-fridman denys-fridman requested review from a team as code owners February 6, 2026 13:16
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

MLCommons CLA bot:
Thank you very much for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the MLCommons CLA (Apache 2). Please use this [Google form] (https://forms.gle/Ew1KkBVpyeJDuRw67) to initiate authorization. If you are from an MLCommons member organization, we will request that you be added to the CLA. If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact support@mlcommons.org.
0 out of 1 committers have signed the MLCommons CLA.
@denys-fridman
You can retrigger this bot by commenting recheck in this Pull Request

- Add rcps_deepseek_v3_671b.json stub with BS 16384/18432/20480,
  learning rates, warmup steps, and gradient accumulation steps
- Register deepseek_v3_671b in benchmark_meta.py (result file counts
  and allowed benchmarks for 6.0)
- Add deepseek_v3_671b to submission_runs and eval_accuracy parsing
  in rcp_checker.py
- Add deepseek_v3_671b entry to result_summarizer config.yaml
@denys-fridman
Copy link
Author

recheck

@ShriyaRishab
Copy link
Contributor

@denys-fridman - can you please complete the CLA?

Also, can you create a PR to training_rules that adds GB300 to the list of acceptable reference hardware (https://github.com/mlcommons/training_policies/blob/master/CONTRIBUTING.md#general)?

REQ: EXACTLY_ONE
CHECK: " v['value'] == 'adamw' "

- KEY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LR and warmup need to be fixed right? The value should be checked to make sure it follows the fixed formula

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

NAME: opt_learning_rate_warmup_steps
REQ: EXACTLY_ONE

- KEY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decay steps should be fixed and checked if the value matches what is expected by the reference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

'flux1': 10,
'llama31_405b': 3,
'llama31_8b': 10,
'deepseek_v3_671b': 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we indeed expect 10 submission runs?

@denys-fridman
Copy link
Author

recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants