Skip to content

What accuracy should we expect when training Alexnet from scratch on ImageNet? #987

Open
@yoderj

Description

@yoderj

📚 Documentation

The README https://github.com/pytorch/examples/blob/main/imagenet/README.md is very helpful when getting started with training AlexNet.

We are able to successfully train AlexNet to approximately 56% top-1 and 79% top-5 accuracy on the validation set. But this is still a fair bit below Krizhevsky's published results of circa 83% or 85% top-5 accuracy on these training sets.

We are training with the default recommendations for a single GPU in the README for AlexNet:

python main.py -a alexnet --lr 0.01 --gpu 0 /data/datasets/imagenet/

What out-of the box accuracy should we expect when training AlexNet on ImageNet with the default PyTorch implementation?

What sort of hyperparameter changes do you recommend to duplicate Alex Krizhevsky's accuracies?

Activity

mostafaelhoushi

mostafaelhoushi commented on May 9, 2022

@mostafaelhoushi
Contributor

Just quoting from this blog article:

The model uses a stochastic gradient descent optimization function with batch size, momentum, and weight decay set to 128, 0.9, and 0.0005 respectively. All the layers use an equal learning rate of 0.001.

mostafaelhoushi

mostafaelhoushi commented on May 9, 2022

@mostafaelhoushi
Contributor

Maybe try those hyperparameters, and if they lead to the expected accuracy, perhaps create a pull request to update the README file accordingly?

msaroufim

msaroufim commented on Jul 10, 2022

@msaroufim
Member

So far our tests aren't in a place where we can guarantee some model performance, the case could be made that maybe we should? But so far we don't have any such plans

mostafaelhoushi

mostafaelhoushi commented on Jul 25, 2022

@mostafaelhoushi
Contributor

So far our tests aren't in a place where we can guarantee some model performance, the case could be made that maybe we should? But so far we don't have any such plans

I came across TorchDrift https://torchdrift.org/
(It is found under PyTorch ecosystem)

It sounds like a tool that can help ensure our models accuracy specs

wangtiance

wangtiance commented on Jan 11, 2023

@wangtiance

Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.

mostafaelhoushi

mostafaelhoushi commented on Jan 11, 2023

@mostafaelhoushi
Contributor

Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.

In the past when I trained the models from scratch, I recall being able to reproduce the accuracy for almost all models.

MobileNet might have its own hyperparameters, but the remaining models should be the same .

wangtiance

wangtiance commented on Jan 12, 2023

@wangtiance

Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.

In the past when I trained the models from scratch, I recall being able to reproduce the accuracy for almost all models.

MobileNet might have its own hyperparameters, but the remaining models should be the same .

Thanks for the response! It's a good thing that one setting can work well for different models.

mostafaelhoushi

mostafaelhoushi commented on Jan 12, 2023

@mostafaelhoushi
Contributor

If you check most vision CNN papers you will find they train with the same hyperparameters: SGD optimizer, 90 epochs, initial learning rate 0.1 that decreases by a tenth every 30 epochs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mostafaelhoushi@msaroufim@yoderj@wangtiance

        Issue actions

          What accuracy should we expect when training Alexnet from scratch on ImageNet? · Issue #987 · pytorch/examples