Skip to content

Make sourcedir.tar.gz and repacked model.tar.gz structure consistent #3491

Open
@plienhar

Description

@plienhar

When deploying a model together with customer code (described by one or more Model.__init__ arguments among entry_point, source_dir, dependencies), the SDK (actually the relevant Model.prepare_container_def method) has 2 options for the customer code:

  • Either bundling all the code artifacts in a sourcedir.tar.gz file. The file is then staged to S3 and later downloaded and extracted in the container at /opt/ml/model/code. If supplied, the entry_point file is copied at the root of the tar file. If supplied, the content of the source_dir directory is copied at the root of the tar file. If supplied, each dependency in dependencies is copied at the root of the tar file. This behavior is implemented by the sagemaker.fw_utils.tar_and_upload_dir function.
  • Or repacking the model and code artifacts together in a single model.tar.gz file. The file is then staged to S3 and later downloaded by the container's host and made available in the container at /opt/ml/model where it is extracted. From the model.tar.gz file perspective, code artifacts (the entry_point file if supplied and the content of the source_dir directory if supplied) are placed in a code folder (location is relative to the root of the tar file). If supplied, each dependency in dependencies in placed in a code/lib folder. This behavior is implemented by the sagemaker.utils._create_or_update_code_dir function.

In both cases, code artifacts end up being available in the inference container at /opt/ml/model/code. However an inconsistency appears if we use dependencies. In that case, our dependencies end up being located:

  • In /opt/ml/model/code if the code was bundled in a source.dir.tar.gz file.
  • In /opt/ml/model/code/lib if the code was repacked with the model artifacts in a model.tar.gz file.

The SageMaker inference toolkits automatically add /opt/ml/model and /opt/ml/model/code to sys.path, unlike /opt/ml/model/code/lib. Therefore, dependencies located in the latter directory cannot be imported using the Python import system. The user/customer has to manually add this location to sys.path for its dependencies to be importable. This ultimately boils down to the inconsistency in the file structure which is annoying since the process of opting for a sourcedir.tar.gz or a repacked model.tar.gz is opaque to the user (and highly framework-dependent).

Notice: We do not consider the Multi-Model Enabled (MME) mode here.

IMHO, the solution with minimal impact would be not to create a code/lib directory in the case of the repacked model.tar.gz, dependencies would simply be copied to the code directory. Dependencies from a repacked model.tar.gz would then be directly available under /opt/ml/code which is already automatically added to sys.path by the inference toolkits. This solution would in fact simply align the structure of the repacked model.tar.gz file on the structure of the sourcedir.tar.gz. The latter being already in use, this fix should not raise backward-compatibility issues.

This topic directly relates to the following issues:

  • Issue 1065 - Failed to import code copied into the /opt/ml/model/code/lib directory
  • Issue 1832 - Extra lib directory when adding dependencies for PyTorchModel

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions