Skip to content

Investigate use of join context for distributed sync #1338

Open
@SkafteNicki

Description

@SkafteNicki

🚀 Feature

Motivation

Based the problems from this issue: #1297
By implementing join context (https://pytorch.org/tutorials/advanced/generic_join.html) for our distributed syncronization we would remove the limitation that to correctly calculate a metric the number of samples needs to be divisible by num_gpus * batch_size (because pytorch by default is adding additional samples to load balance).

Pitch

Base class should derive from Joinable class and implement appropriate methods. It should hopefully not be too much trouble as all the sync logic is already encapsulated in a function.

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions