Description
Describe the bug
When configuring Data Capture for a Batch Transform job using the SageMaker Python SDK, the job creation succeeds, but the execution fails with an "Internal Server Error". If Data Capture is not enabled, the job finishes successfully. This suggests a bug related to the Data Capture configuration in the Batch Transform step.
To reproduce
The setup is the same for both scenarios, with or without DataCaptureConfig
:
from datetime import datetime
from sagemaker.transformer import Transformer
from sagemaker.inputs import BatchDataCaptureConfig
input_s3_data_location = "s3://bucket/prefix/batch-transform/input/input.json"
output_s3_data_location = "s3://bucket/prefix/batch-transform/output"
data_capture_destination = "s3://bucket/prefix/batch-transform/captured-data"
model_name = "my-previously-created-model"
transformer = Transformer(
model_name=model_name,
strategy="SingleRecord",
instance_count=1,
instance_type="ml.m5.large",
output_path=output_s3_data_location,
max_concurrent_transforms=1,
max_payload=6,
tags=[{"Key": "some-key", "Value": "some-value"}],
)
timestamp = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
job_name = f"batch-transform-{timestamp}"
- Batch Transform job execution without DataCaptureConfig - Success
transform_arg = transformer.transform(
job_name=job_name,
data=input_s3_data_location,
data_type="S3Prefix",
content_type="application/json",
split_type="Line",
wait=True,
logs=True,
)
- Batch Transform job execution with DataCaptureConfig - Failure with an Internal Server Error
transform_arg = transformer.transform(
batch_data_capture_config=BatchDataCaptureConfig(
destination_s3_uri=data_capture_destination,
generate_inference_id=True,
),
job_name=job_name,
data=input_s3_data_location,
data_type="S3Prefix",
content_type="application/json",
split_type="Line",
wait=True,
logs=True,
)
Note: I've also tested with CSV files. The behavior is the same.
Expected behavior
Enabling Data Capture for Batch Transform should not cause the job to fail with an Internal Server Error. The job should complete successfully, and captured data should be stored as configured.
Screenshots or logs
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.244.1
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): n/a
- Framework version: n/a
- Python version: 3.12
- CPU or GPU: Used instance type ml.m5.large
- Custom Docker image (Y/N): Y
Additional context
n/a