Conversation
|
MLCommons CLA bot: |
mlperf_logging/compliance_checker/training_6.0.0/closed_common.yaml
Outdated
Show resolved
Hide resolved
mlperf_logging/compliance_checker/training_6.0.0/closed_deepseekv3_671b.yaml
Show resolved
Hide resolved
039c2fa to
f3a3dc7
Compare
- Add rcps_deepseek_v3_671b.json stub with BS 16384/18432/20480, learning rates, warmup steps, and gradient accumulation steps - Register deepseek_v3_671b in benchmark_meta.py (result file counts and allowed benchmarks for 6.0) - Add deepseek_v3_671b to submission_runs and eval_accuracy parsing in rcp_checker.py - Add deepseek_v3_671b entry to result_summarizer config.yaml
f3a3dc7 to
c528ecb
Compare
|
recheck |
|
@denys-fridman - can you please complete the CLA? Also, can you create a PR to training_rules that adds GB300 to the list of acceptable reference hardware (https://github.com/mlcommons/training_policies/blob/master/CONTRIBUTING.md#general)? |
| REQ: EXACTLY_ONE | ||
| CHECK: " v['value'] == 'adamw' " | ||
|
|
||
| - KEY: |
There was a problem hiding this comment.
LR and warmup need to be fixed right? The value should be checked to make sure it follows the fixed formula
| NAME: opt_learning_rate_warmup_steps | ||
| REQ: EXACTLY_ONE | ||
|
|
||
| - KEY: |
There was a problem hiding this comment.
decay steps should be fixed and checked if the value matches what is expected by the reference
| 'flux1': 10, | ||
| 'llama31_405b': 3, | ||
| 'llama31_8b': 10, | ||
| 'deepseek_v3_671b': 10, |
There was a problem hiding this comment.
Do we indeed expect 10 submission runs?
|
recheck |
No description provided.