Skip to content

Local mode crashes on Windows when running locally #1140

Open
@Serhiy-Shekhovtsov

Description

@Serhiy-Shekhovtsov

Reference: MLFW-2730

System Information

  • Framework: PyTorch
  • Framework Version: 1.2.0
  • Python Version: 3.6.6
  • CPU or GPU: Any
  • Python SDK Version: 1.44.1
  • Are you using a custom image: Yes

Describe the problem

I am building a custom PyTorch inference container using one of the examples provided by AWS Labs. When tried to run the image locally I've got into an issue - deploying the model locally using model.deploy was showing that image crashed but didn't show any information about the reason of it.

Code example

from sagemaker import Model, local

model = Model(
    name=sage_model_name,
    model_data='file:///Projects/..../yolov3-model.tar.gz',
    image=image_uri,
    role=role,
    sagemaker_session=local.LocalSession()
)

model.deploy(1, 'local')

Logs:

Attaching to tmp7zvuuict_algo-1-op9su_1
Exception in thread Thread-6:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\threading.py", line 917, in _bootstrap_inner
self.run()
File "C:\ProgramData\Anaconda3\lib\site-packages\sagemaker\local\image.py", line 606, in run
_stream_output(self.process)
File "C:\ProgramData\Anaconda3\lib\site-packages\sagemaker\local\image.py", line 664, in _stream_output
stdout = process.stdout.readline().decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 41: invalid start byte

Patching the SDK code to see the actual problem

In order to find out what was the issue I had to patch the file sagemaker\local\image.py at this line and replace the process.stdout.readline().decode("utf-8") with

stdout = process.stdout.readline().decode("ISO-8859-1")
print(stdout)

Now I was able to see the original error message comming fom Docker:

/usr/bin/env: python\r: No such file or directory

Fixing the original problem

By looking at the error message I figured out that the problem was caused by Windows-style line breaks. Apparently git converted my serve file from lf to crlf on commit. This was causing the problem.
It's unfortunate that it takes patching SDK's code to figure out what's happening so I thought it's a good idea to mention it here.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions