-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Currently, the learning rate decay happens after each iteration and the update rule is
lr = config.lr/(1 + args.lr_decay*step)So, the learning rate of step 0 and 1 will be the same value config.lr.
Is this the expected behavior? Or, the following is correct
lr = config.lr/(1 + args.lr_decay*(step+1))Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels