Skip to content

Non-reproducible results with num_workers=0 #20679

Open
@mrava87

Description

@mrava87

Bug description

Hello,
I have a question related to obtaining reproducible results when setting num_workers in a torch.DataLoader and using pl.LightningDataModule and pl.Trainer.

So far I am experiencing the following: when i set num_workers=0 the results are different to those obtained when num_workers>0 for any epoch other than the first (and my understanding is that the way the data is reshuffled within each epoch is somehow the same for all num_workers>0 but different for num_workers=0). Is this an expected behaviour? Any suggestion to prevent it?

If you think this is a bug, I can provide a concise example that reproduces what I see (which however requires stripping out code from an internal library so I'd rather do it only if you think what I get is likely a bug and not an expected behaviour 😄 )

Thanks!

What version are you seeing the problem on?

v2.5

How to reproduce the bug

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):

More info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageWaiting to be triaged by maintainersver: 2.5.x

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions