Skip to content

Why the authors choose a different AdamW optimizer on ImageNet? #4

@hunto

Description

@hunto

The paper described that
In terms of ImageNet, we use the AdamW optimizer [18] to train the network for 100 epochs with a total batch size of 256. The initial learning rate is 2e-4 reduced by 0.1 at epochs 30, 60, and 90.

However, on ImageNet, most papers (e.g., CRD) adopt SGD optimizer with an initial learning rate 0.1 on ResNet34-ResNet18 models.

Why the authors choose an uncommon AdamW optimizer on ImageNet?
Can you provide the results of ICKD with the same strategy as previous works for fair comparisons?

Thanks :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions