Skip to content

Nextflow failing to submit jobs to AWS Batch when under load #3644

Open
@robsyme

Description

@robsyme

Bug report

This is a draft issue, with some details still to come.

Expected behavior and actual behavior

We're seeing some runs with large amounts of data passing through publishDir stalling until the networking load (from publishing) has fallen away.

Steps to reproduce the problem

Reproducing this bug is difficult as it requires publishing 30TB of data to S3, which can take some time just for data transfer. I'm using the repository https://github.com/robsyme/nf-test/tree/network-testing to simulate these large runs.

Program output

Nextflow logs are normal and don't indicate anything unusual, but I'll run again and capture the jstack trace to get a better idea of what is going on inside the Nextflow process.

Environment

  • Nextflow version: [?]
  • Java version: [?]
  • Operating system: [macOS, Linux, etc]
  • Bash version: (use the command $SHELL --version)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions