Skip to content

ScriptProcessor does not check local_code config before uploading code to S3 #3560

Open
@lodo1995

Description

@lodo1995

Describe the bug
When a LocalSession or LocalPipelineSession is configured to use local code, as follows

session.config = {'local': {'local_code': True}}

the code passed to a pipeline ProcessingStep or directly to the run method of a processor (ScriptProcessor, FrameworkProcessor, ...) should not be uploaded to S3.

However, ScriptProcessor does not honor this. Its _include_code_in_inputs method (which is called unconditionally by the _normalize_args of the base class Processor, which in turn is called both when running directly and through a pipeline) unconditionally tries to upload the code to S3.

def _include_code_in_inputs(self, inputs, code, kms_key=None):

Compare this to the Model class, used for example in the TrainingStep. Its _upload_code method checks the session configuration and does not upload to S3 when local code is enabled.

def _upload_code(self, key_prefix: str, repack: bool = False) -> None:

To reproduce
In the absence of any AWS credentials (which should not be needed when running completely locally), the following code will fail to upload the processing.py script to S3 (botocore.exceptions.NoCredentialsError). Note that, in addition to the following code, a processing.py file must exist in the working directory (but its contents don't matter).

Code
import boto3
import sagemaker
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.pipeline_context import LocalPipelineSession
from sagemaker.processing import ProcessingInput, ProcessingOutput, ScriptProcessor
from sagemaker.workflow.steps import ProcessingStep

role = 'arn:aws:iam::123456789012:role/MyRole'

local_pipeline_session = LocalPipelineSession(boto_session = boto3.Session(region_name = 'eu-west-1'))
local_pipeline_session.config = {'local': {'local_code': True}}

script_processor = ScriptProcessor(
    image_uri = 'docker.io/library/python:3.8',
    command = ['python'],
    instance_type = 'local',
    instance_count = 1,
    sagemaker_session = local_pipeline_session,
    role = role,
)

processing_step = ProcessingStep(
    name = 'Processing Step',
    processor = script_processor,
    code = 'processing.py',
    inputs = [
        ProcessingInput(
            source = './input-data',
            destination = '/opt/ml/processing/input',
        )
    ],
    outputs = [
        ProcessingOutput(
            source = '/opt/ml/processing/output',
            destination = './output-data',
        )
    ],
)

pipeline = Pipeline(
    name = 'MyPipeline',
    steps = [processing_step],
    sagemaker_session = local_pipeline_session
)

pipeline.upsert(role_arn = role)

pipeline_run = pipeline.start()

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.126.0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions