Open
Description
Repro
- Apply Add support for deterministic behavior in ImageNet benchmark #381 on this repo
cd
to the ImageNet folder- Run
python main.py --arch resnet18 --seed 0 --gpu 0 /path/to/imagenet/
on a multi-GPU machine, once withCUDA_VISIBLE_DEVICES=0
and once withCUDA_VISIBLE_DEVICES=1
.
Environment
- PyTorch master
- CUDA 9.0
- Driver 384.81
- Ubuntu 16.04
Expected behavior
The two runs have the same output.
Actual behavior
The two runs have the same output when you run them one after the other (e.g. GPU 0 first, then Ctrl-C, then GPU 1). But when you run them at the same time, you get different output.
Suspicion
This is a driver bug. I dunno how PyTorch would be able to bypass CUDA_VISIBLE_DEVICES
-based GPU segregation. But posting here for visibility anyways.