[trainer] update time format & fix resume from checkpoint train_speed by Jintao-Huang · Pull Request #8007 · modelscope/ms-swift

Jintao-Huang · 2026-02-07T13:13:46Z

No description provided.

gemini-code-assist · 2026-02-07T13:14:01Z

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the accuracy and precision of training progress metrics within the trainer module. It resolves an issue where training speed and estimated remaining time were incorrectly reported when training was resumed from a checkpoint. Additionally, the changes enhance the granularity of time displays in logs, providing a more detailed and accurate representation of training duration.

Highlights

Accurate Training Speed and Remaining Time Calculation: The calculation for train_speed and remaining_time has been updated to correctly account for training resumed from a checkpoint. This is achieved by introducing a start_step parameter, ensuring that the speed is calculated based on the iterations completed since the start of the current training run, not the global step count from the very beginning.
Change in Training Speed Metric: The train_speed metric has been re-defined from 'iterations per second' (iter/s) to 'seconds per iteration' (s/it). This provides a different perspective on training efficiency, indicating the average time taken for each training step.
Enhanced Time Formatting Precision: The format_time utility function now displays the seconds component with two decimal places, offering more precise reporting of elapsed and remaining times in the logs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

swift/trainers/patcher.py
- Modified the add_train_message function to accept a start_step argument, which is used to calculate train_speed and remaining_time more accurately when resuming training.
- Updated the train_speed calculation to elapsed / (state.global_step - start_step) and changed its log key from 'train_speed(iter/s)' to 'train_speed(s/it)'.
- Adjusted the remaining_time calculation to utilize the newly computed train_speed.
- Introduced self.start_step = state.global_step in the on_train_begin methods of ProgressCallbackNew and PrinterCallbackNew to capture the starting global step for accurate speed calculation.
- Passed self.start_step to add_train_message in the on_log methods of ProgressCallbackNew and PrinterCallbackNew.
swift/utils/utils.py
- Modified the format_time function to round the seconds component to two decimal places (round(seconds % 60, 2)) instead of truncating it to an integer, providing more precise time formatting.

Activity

No specific activity (comments, reviews, etc.) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the training time format and fixes an issue with calculating training speed when resuming from a checkpoint. The changes in swift/trainers/patcher.py correctly adjust the train_speed and remaining_time calculations by considering the start_step. The change in swift/utils/utils.py increases the precision of the formatted time, which is a good improvement. My review includes one suggestion to prevent a potential ZeroDivisionError in the training speed calculation for added robustness. Overall, the changes are well-implemented and address the intended purpose.

gemini-code-assist · 2026-02-07T13:15:51Z

swift/trainers/patcher.py

    logs['percentage'] = f'{train_percentage * 100:.2f}%'
    elapsed = time.time() - start_time
    logs['elapsed_time'] = format_time(elapsed)
+    train_speed = elapsed / (state.global_step - start_step)


There's a potential for a ZeroDivisionError here. If on_log is called when state.global_step is equal to start_step (e.g., immediately after resuming from a checkpoint before the first training step is taken), this will raise an exception. It's safer to guard against this division by zero for robustness.

steps_done = state.global_step - start_step train_speed = elapsed / steps_done if steps_done > 0 else 0.0

update time update

a357d06

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

hjh0119 approved these changes Feb 7, 2026

View reviewed changes

tastelikefeet approved these changes Feb 7, 2026

View reviewed changes

update

f2e271d

Jintao-Huang merged commit 4451f2a into modelscope:main Feb 7, 2026
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[trainer] update time format & fix resume from checkpoint train_speed#8007

[trainer] update time format & fix resume from checkpoint train_speed#8007
Jintao-Huang merged 2 commits intomodelscope:mainfrom
Jintao-Huang:update_time_update

Jintao-Huang commented Feb 7, 2026

Uh oh!

gemini-code-assist bot commented Feb 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Jintao-Huang commented Feb 7, 2026

Uh oh!

gemini-code-assist bot commented Feb 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants