Skip to content

Using AthenaDatasetDefinition in Sagemaker processing job as input results in error with missing "sagemaker_processing" database. #5176

Open
@leo4ever

Description

@leo4ever

Describe the bug
I am trying to setup a Sagemaker processing job where the job input is defined using the AthenaDatasetDefinition. When executing the job, it fails with message below. It appears the job is trying to create a new database sagemaker_processing. I have tried to specify to reuse an existing database using the dataset definition parameters and also specified the output S3 URI parameter but they don't seem to help.

{"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error creating database 'sagemaker_processing' in catalog 'awsdatacatalog'."} {"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error AccessDeniedException: User: arn:aws:sts::726167300549:assumed-role/99999-sagemaker-devmanaged-role/SageMaker is not authorized to perform: glue:CreateDatabase on resource: arn:aws:glue:us-west-2:726167300549:catalog because no identity-based policy allows the glue:CreateDatabase action"}

To reproduce

  1. Define a sagemaker processing job using AthenaDatasetDefinition as ProcessingInput.
  2. Execute the job

Expected behavior

  1. Job executes without trying to create a new database.

Screenshots or logs
{"level":"INFO","ts":"2025-05-13T16:18:55.011Z","msg":"[sagemaker logs] [Input: input-1] Athena dataset definition specified. Starting athena query execution."} {"level":"INFO","ts":"2025-05-13T16:18:55.011Z","msg":"[sagemaker logs] [Input: input-1] Creating database 'sagemaker_processing' in catalog 'awsdatacatalog' if doesn't exist already."} {"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error creating database 'sagemaker_processing' in catalog 'awsdatacatalog'."} {"level":"ERROR","ts":"2025-05-13T16:18:55.242Z","msg":"[sagemaker logs] [Input: input-1] Error AccessDeniedException: User: arn:aws:sts::726167300549:assumed-role/99999-sagemaker-devmanaged-role/SageMaker is not authorized to perform: glue:CreateDatabase on resource: arn:aws:glue:us-west-2:726167300549:catalog because no identity-based policy allows the glue:CreateDatabase action"}

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.227.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): ScriptProcessor
  • Framework version:
  • Python version: 3.11.11
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions