Skip to content

\nu coefficient in the numerator of \hat{n_t} expression #3

@Arech

Description

@Arech

Hello!
I'm implementing Nadam/Radam for my NNTL project. Would you be so kind to give me some intuition or clarify one thing about the Nadam's algorithm, as it was posted in ICLR 2016 paper ?

Here is a line for \hat{n_t} expression in Algorithm2 in the paper:

\hat{n_t} \leftarrow \nu n_t / (1-\nu ^t)

It contains \nu coefficient before n_t in the numerator. What's the purpose of this scaling? Neither original Adam, nor your own implementation of Nadam here as well as original report paper, doesn't have that coefficient in the numerator.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions