Open
Description
Describe the bug
I tried to use the max_run
parameter of sagemaker.pytorch.estimator.PyTorch
to define the max run time in seconds, but it doesnt work. See the attached screenshot for an example. In the screenshot, I set max_run
to be 603 seconds. But it didnt stop at 603, evidenced by the training time at 841s (at which I manually terminated the run)
To reproduce
Just set max_run
of sagemaker.pytorch.estimator.PyTorch
to be any integer value
Expected behavior
I expect the sagemaker training run to terminate when it has elapsed the seconds set in max_run
Screenshots or logs
See screenshot in description
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.207.1
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
- Framework version: 2.2.0
- Python version: 3.10.1
- CPU or GPU: CPU locally, and GPU instance on Sagemaker
- Custom Docker image (Y/N): N
Additional context
NA