Skip to content

sagemaker.local bug when inputs are binary files #4996

Open
@vojavocni

Description

@vojavocni

Describe the bug
Hello, I think I encountered a bug in sagemaker.local. I'm trying to test a batch transform with images as input, but I get the following error even before I reach the input_fn of my custom inference script

│   345 │   │   for element in self.splitter.split(file):
│ ❱ 346 │   │   │   if _payload_size_within_limit(buffer + element, size):
│   347 │   │   │   │   buffer += element
│   348 │   │   │   else:
│   349 │   │   │   │   tmp = buffer
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: can only concatenate str (not "bytes") to str

I am not using a splitter (splitter type is None), as it's not necessary on images.

I believe the problem is in line 343 of MultRecordStrategy class

class MultiRecordStrategy(BatchStrategy):
"""Feed multiple records at a time for batch inference.
Will group up as many records as possible within the payload specified.
"""
def pad(self, file, size=6):
"""Group together as many records as possible to fit in the specified size.
Args:
file (str): file path to read the records from.
size (int): maximum size in MB that each group of records will be
fitted to. passing 0 means unlimited size.
Returns:
generator of records
"""
buffer = ""
for element in self.splitter.split(file):
if _payload_size_within_limit(buffer + element, size):
buffer += element
else:
tmp = buffer
buffer = element
yield tmp
if _validate_payload_size(buffer, size):
yield buffer

We can see that the buffer variable is assumed to be a string, which means it's assumed that the file variable would not refer to a binary object, which should be possible.

To reproduce
Just run local batch transform with a single image as input. The model doesn't really matter I think, it will fail before any prediction or interaction between data and the model is made.

Expected behavior
I would expect the buffer to be sensitive to weather the file is a string like json or csv, or a binary type like png.

Screenshots or logs
See above.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.237.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch, custom inference and model
  • Framework version: 2.5.1
  • Python version: 3.11
  • CPU or GPU: Both
  • Custom Docker image (Y/N): Y, extending the pytorch-inference:2.5.1-gpu-py311-cu124-ubuntu22.04-sagemaker image

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions