\nu coefficient in the numerator of \hat{n_t} expression

Hello!
I'm implementing Nadam/Radam for my [NNTL](https://github.com/Arech/nntl) project. Would you be so kind to give me some intuition or clarify one thing about the Nadam's algorithm, as it was posted in [ICLR 2016 paper](https://www.openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ) ?

Here is a line for \hat{n_t} expression in Algorithm2 in the paper:

\hat{n_t} \leftarrow \nu n_t / (1-\nu ^t)

It contains \nu coefficient before n_t in the numerator. What's the purpose of this scaling? Neither original Adam, nor your own implementation of Nadam here as well as [original report paper](http://cs229.stanford.edu/proj2015/054_report.pdf), doesn't have that coefficient in the numerator.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

\nu coefficient in the numerator of \hat{n_t} expression #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

\nu coefficient in the numerator of \hat{n_t} expression #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions