Open
Description
Outline & Motivation
When trying to fix #17138, I found that the definition of root_dir
in fabric and pytorch is different. Let's say we want the logging dir to be like tmpdir/some_exp/name/version_0
. Here is a table of the current behaviors:
Fabric CSVLogger | Fabric TensorboardLogger | PL CSVLogger | PL TensorboardLogger | |
---|---|---|---|---|
what user should pass in init |
tmpdir/some_exp/name |
tmpdir/some_exp |
tmpdir/some_exp |
tmpdir/some_exp |
what root_dir stored |
tmpdir/some_exp/name |
tmpdir/some_exp |
tmpdir/some_exp/name |
tmpdir/some_exp/name |
has _get_next_version implemented |
Yes | Yes | No, calling the function from fabric | Yes |
The differences are not obvious because I assume users should use the PL API, but these differences have made the corresponding test modules and internal implementations like _get_next_version
having different calling conventions and implementations.
Pitch
Unify the definitions of the root_dir
variable, and possibly refactor the code to include fewer copies of functions like _get_next_version
.
Additional context
No response