Open
Description
📚 Documentation
The README https://github.com/pytorch/examples/blob/main/imagenet/README.md is very helpful when getting started with training AlexNet.
We are able to successfully train AlexNet to approximately 56% top-1 and 79% top-5 accuracy on the validation set. But this is still a fair bit below Krizhevsky's published results of circa 83% or 85% top-5 accuracy on these training sets.
We are training with the default recommendations for a single GPU in the README for AlexNet:
python main.py -a alexnet --lr 0.01 --gpu 0 /data/datasets/imagenet/
What out-of the box accuracy should we expect when training AlexNet on ImageNet with the default PyTorch implementation?
What sort of hyperparameter changes do you recommend to duplicate Alex Krizhevsky's accuracies?
Activity
mostafaelhoushi commentedon May 9, 2022
Just quoting from this blog article:
mostafaelhoushi commentedon May 9, 2022
Maybe try those hyperparameters, and if they lead to the expected accuracy, perhaps create a pull request to update the README file accordingly?
msaroufim commentedon Jul 10, 2022
So far our tests aren't in a place where we can guarantee some model performance, the case could be made that maybe we should? But so far we don't have any such plans
mostafaelhoushi commentedon Jul 25, 2022
I came across TorchDrift https://torchdrift.org/
(It is found under PyTorch ecosystem)
It sounds like a tool that can help ensure our models accuracy specs
wangtiance commentedon Jan 11, 2023
Hello, not sure if I should open a new issue for this, but are the pretrained models trained with default hyperparameters? And do all the pretrained models match the accuracies from the original papers? It seems unlikely that the default setting can achieve the best result for every model.
mostafaelhoushi commentedon Jan 11, 2023
In the past when I trained the models from scratch, I recall being able to reproduce the accuracy for almost all models.
MobileNet might have its own hyperparameters, but the remaining models should be the same .
wangtiance commentedon Jan 12, 2023
Thanks for the response! It's a good thing that one setting can work well for different models.
mostafaelhoushi commentedon Jan 12, 2023
If you check most vision CNN papers you will find they train with the same hyperparameters: SGD optimizer, 90 epochs, initial learning rate 0.1 that decreases by a tenth every 30 epochs.