Skip to content

The reward on the validation set of the CALX example has been staying around 0.5 and is unable to improve or increase. #454

@johnson7788

Description

@johnson7788

Why does the reward on the validation set of my CALX example stay around 0.5 and fail to improve? I am using a 0.5B-parameter model—could the model be too small, or is the reward too sparse? How should I improve it?

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    examplesquestionQuestion about a feature or some usage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions