Description
Describe the bug
When a LocalSession
or LocalPipelineSession
is configured to use local code, as follows
session.config = {'local': {'local_code': True}}
the code passed to a pipeline ProcessingStep
or directly to the run
method of a processor (ScriptProcessor
, FrameworkProcessor
, ...) should not be uploaded to S3.
However, ScriptProcessor
does not honor this. Its _include_code_in_inputs
method (which is called unconditionally by the _normalize_args
of the base class Processor
, which in turn is called both when running directly and through a pipeline) unconditionally tries to upload the code to S3.
Compare this to the Model
class, used for example in the TrainingStep
. Its _upload_code
method checks the session configuration and does not upload to S3 when local code is enabled.
sagemaker-python-sdk/src/sagemaker/model.py
Line 532 in 554952e
To reproduce
In the absence of any AWS credentials (which should not be needed when running completely locally), the following code will fail to upload the processing.py
script to S3 (botocore.exceptions.NoCredentialsError
). Note that, in addition to the following code, a processing.py
file must exist in the working directory (but its contents don't matter).
Code
import boto3
import sagemaker
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.pipeline_context import LocalPipelineSession
from sagemaker.processing import ProcessingInput, ProcessingOutput, ScriptProcessor
from sagemaker.workflow.steps import ProcessingStep
role = 'arn:aws:iam::123456789012:role/MyRole'
local_pipeline_session = LocalPipelineSession(boto_session = boto3.Session(region_name = 'eu-west-1'))
local_pipeline_session.config = {'local': {'local_code': True}}
script_processor = ScriptProcessor(
image_uri = 'docker.io/library/python:3.8',
command = ['python'],
instance_type = 'local',
instance_count = 1,
sagemaker_session = local_pipeline_session,
role = role,
)
processing_step = ProcessingStep(
name = 'Processing Step',
processor = script_processor,
code = 'processing.py',
inputs = [
ProcessingInput(
source = './input-data',
destination = '/opt/ml/processing/input',
)
],
outputs = [
ProcessingOutput(
source = '/opt/ml/processing/output',
destination = './output-data',
)
],
)
pipeline = Pipeline(
name = 'MyPipeline',
steps = [processing_step],
sagemaker_session = local_pipeline_session
)
pipeline.upsert(role_arn = role)
pipeline_run = pipeline.start()
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.126.0