[WIP] Add Minimum Risk Trainer support #427

alexandremuzio · 2023-04-10T05:40:47Z

Initial working version of MRT.
Besides the difference in the learning procedure, most of the MRT types are pretty similar to the PPO ones, such as MRTRLElement. The main difference is that MRT generates a batch of candidate sentences for the same prompt instead of a single sentence per prompt.

Main changes:

Add AccelerateMRTTrainer
Introducing MRT configs/data types (trlx/data/mrt_types.py,
Example for translation: examples/mrt_translation_t5.py

I've currently tested it with t5 on translation and summarization tasks.

TODOs:

Test on decoder-only models
Make sure it is working for
Add support for MarianMT models
debug_mrt_summarize_daily_cnn_t5.py (only for debugging)

Please let me know if there are any other suggestions as well. cc @LouisCastricato

alexandremuzio and others added 9 commits March 7, 2023 05:48

[WIP] MRT

a68acf8

[WIP] make_experience working

db629e3

[WIP] MRT make_experience working with batch

7f2478e

[WIP] MRT Training now actually training

3f4729b

Creating debug file separately for MRT T5

1b3cc44

Adding T5 translation task with ppo

fd7ba2f

Adding metric_fn to ppo translation example

642a48f

Merge branch 'main' into mrt

5e5b284

Removing some unused stuff + some improvements + fixing formatting

945f344

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add Minimum Risk Trainer support #427

[WIP] Add Minimum Risk Trainer support #427

Uh oh!

alexandremuzio commented Apr 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] Add Minimum Risk Trainer support #427

Are you sure you want to change the base?

[WIP] Add Minimum Risk Trainer support #427

Uh oh!

Conversation

alexandremuzio commented Apr 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant