Why the authors choose a different AdamW optimizer on ImageNet?

The paper described that
```In terms of ImageNet, we use the AdamW optimizer [18] to train the network for 100 epochs with a total batch size of 256. The initial learning rate is 2e-4 reduced by 0.1 at epochs 30, 60, and 90.```

However, on ImageNet, most papers (e.g., CRD) adopt SGD optimizer with an initial learning rate 0.1 on ResNet34-ResNet18 models.

Why the authors choose an uncommon AdamW optimizer on ImageNet?
Can you provide the results of ICKD with the same strategy as previous works for fair comparisons?

Thanks :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why the authors choose a different AdamW optimizer on ImageNet? #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why the authors choose a different AdamW optimizer on ImageNet? #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions