This repository was archived by the owner on Dec 14, 2023. It is now read-only.
This repository was archived by the owner on Dec 14, 2023. It is now read-only.
First GPU occupies more VRAM in distributed training #66
Open
Description
link,
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
cached_latent = torch.load(self.cached_data_list[index], map_location=device)
Otherwise, in multi-GPU distributed training, the first GPU may occupy excessive VRAM compared to the other GPUs.