Skip to content

Inconsistency between the definition of root_dir in fabric and pytorch loggers #17155

Open
@water-vapor

Description

@water-vapor

Outline & Motivation

When trying to fix #17138, I found that the definition of root_dir in fabric and pytorch is different. Let's say we want the logging dir to be like tmpdir/some_exp/name/version_0. Here is a table of the current behaviors:

Fabric CSVLogger Fabric TensorboardLogger PL CSVLogger PL TensorboardLogger
what user should pass in init tmpdir/some_exp/name tmpdir/some_exp tmpdir/some_exp tmpdir/some_exp
what root_dir stored tmpdir/some_exp/name tmpdir/some_exp tmpdir/some_exp/name tmpdir/some_exp/name
has _get_next_version implemented Yes Yes No, calling the function from fabric Yes

The differences are not obvious because I assume users should use the PL API, but these differences have made the corresponding test modules and internal implementations like _get_next_version having different calling conventions and implementations.

Pitch

Unify the definitions of the root_dir variable, and possibly refactor the code to include fewer copies of functions like _get_next_version.

Additional context

No response

cc @justusschock @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions