Skip to content

Pytorch 2.0.0 leaks memory when using model.compile #3961

Open
@usamec

Description

@usamec

Describe the bug
Trition version is old and affected by this
pytorch/pytorch#96937

To reproduce

See attached issue.

Expected behavior

No leaks.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.165.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
  • Framework version: 2.0.0
  • Python version: 3.10
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context
You are seriously using development version of packages???

Found existing installation: triton 2.0.0.dev20221202

Adding:
triton==2.0.0.post1 into requirements fixes the issue.

Honestly, when we are paying much more for Sagemaker training compared to EC2, I would expect some level of support and comfort.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions