Skip to content

PyTorch Estimator max_run parameter not working at all #4451

Open
@yiphei

Description

@yiphei

Describe the bug
I tried to use the max_run parameter of sagemaker.pytorch.estimator.PyTorch to define the max run time in seconds, but it doesnt work. See the attached screenshot for an example. In the screenshot, I set max_run to be 603 seconds. But it didnt stop at 603, evidenced by the training time at 841s (at which I manually terminated the run)
Screenshot 2024-02-23 at 6 47 13 PM

To reproduce
Just set max_run of sagemaker.pytorch.estimator.PyTorch to be any integer value

Expected behavior
I expect the sagemaker training run to terminate when it has elapsed the seconds set in max_run

Screenshots or logs
See screenshot in description

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.207.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
  • Framework version: 2.2.0
  • Python version: 3.10.1
  • CPU or GPU: CPU locally, and GPU instance on Sagemaker
  • Custom Docker image (Y/N): N

Additional context
NA

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions