Open
Description
Describe the feature you'd like
Keep local inputs and outputs local in processing jobs when using local mode.
How would this feature be used? Please describe.
Currently, all local inputs are uploaded to the default bucket specified on the Sagemaker session.
There is no differentiation between local mode and non-local mode.
sagemaker-python-sdk/src/sagemaker/processing.py
Lines 300 to 318 in 22ba84e
This has two issues:
- The
LocalSession
does not take adefault_bucket
argument and thus the bucket used for the upload can only be changed by modifying the private attribute of the session after creation, which is hacky to say the least. It is also not documented anywhere, as far as I can tell. - It is not ideal to upload all local inputs to s3 just to download them to the container again in local mode. Local mode should be truly local if inputs/outputs are local paths.
Describe alternatives you've considered
- Use s3 paths for inputs and outputs but this slows down local mode since the SDK will download/upload those inputs/outputs each time. Local mode should enable quick local testing independent of cloud resources like s3 buckets.