Skip to content

Batch Transform Job fails with Internal Server Error when Data Capture is configured #5182

Open
@thatayster

Description

@thatayster

Describe the bug
When configuring Data Capture for a Batch Transform job using the SageMaker Python SDK, the job creation succeeds, but the execution fails with an "Internal Server Error". If Data Capture is not enabled, the job finishes successfully. This suggests a bug related to the Data Capture configuration in the Batch Transform step.

To reproduce

The setup is the same for both scenarios, with or without DataCaptureConfig:

from datetime import datetime
from sagemaker.transformer import Transformer
from sagemaker.inputs import BatchDataCaptureConfig

input_s3_data_location = "s3://bucket/prefix/batch-transform/input/input.json"
output_s3_data_location = "s3://bucket/prefix/batch-transform/output"
data_capture_destination = "s3://bucket/prefix/batch-transform/captured-data"
model_name = "my-previously-created-model"

transformer = Transformer(
    model_name=model_name,
    strategy="SingleRecord",
    instance_count=1,
    instance_type="ml.m5.large",
    output_path=output_s3_data_location,
    max_concurrent_transforms=1,
    max_payload=6,
    tags=[{"Key": "some-key", "Value": "some-value"}],
)

timestamp = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
job_name = f"batch-transform-{timestamp}"
  1. Batch Transform job execution without DataCaptureConfig - Success
transform_arg = transformer.transform(
    job_name=job_name,
    data=input_s3_data_location,
    data_type="S3Prefix",
    content_type="application/json", 
    split_type="Line",
    wait=True,
    logs=True,
)
  1. Batch Transform job execution with DataCaptureConfig - Failure with an Internal Server Error
transform_arg = transformer.transform(
    batch_data_capture_config=BatchDataCaptureConfig(
        destination_s3_uri=data_capture_destination,
        generate_inference_id=True,
    ),
    job_name=job_name,
    data=input_s3_data_location,
    data_type="S3Prefix",
    content_type="application/json",
    split_type="Line",
    wait=True,
    logs=True,
)

Note: I've also tested with CSV files. The behavior is the same.

Expected behavior
Enabling Data Capture for Batch Transform should not cause the job to fail with an Internal Server Error. The job should complete successfully, and captured data should be stored as configured.

Screenshots or logs

Image

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.244.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): n/a
  • Framework version: n/a
  • Python version: 3.12
  • CPU or GPU: Used instance type ml.m5.large
  • Custom Docker image (Y/N): Y

Additional context
n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions