Releases · aws/aws-parallelcluster

02 Dec 12:18

enrico-usai

v2.11.9

087dd16

AWS ParallelCluster v2.11.9

We're excited to announce the release of AWS ParallelCluster 2.11.9

Upgrade

How to upgrade?

sudo pip install aws-parallelcluster==2.11.9

BUG FIXES

Prevent updating vpc_security_group_id when a managed FSx for Lustre file system is configured in the cluster.
Doing so would result in file system deletion and potential data loss.

Assets 2

03 Dec 00:49

chenwany

v3.3.1

43fb021

AWS ParallelCluster v3.3.1

We're excited to announce the release of AWS ParallelCluster 3.3.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

CHANGES

Allow to use official product AMIs even after the two years EC2 deprecation time.
Increase memory size of ParallelCluster API Lambda to 2048 in order to reduce cold start penalty and avoid timeouts.

BUG FIXES

Prevent managed FSx for Lustre file systems to be replaced during a cluster update avoiding to support changes on the compute fleet subnet id.
Apply the DeletionPolicy defined on shared storages also during the cluster update operations.

Assets 2

15 Nov 01:36

chenwany

v2.11.8

2da6f75

AWS ParallelCluster v2.11.8

We're excited to announce the release of AWS ParallelCluster 2.11.8

Upgrade

How to upgrade?

sudo pip install aws-parallelcluster==2.11.8

CHANGES

Upgrade Intel MPI Library to 2021.6.0.602.
Upgrade EFA installer to 1.19.0
- Efa-driver: efa-1.16.0-1
- Efa-config: efa-config-1.11-1
- Efa-profile: efa-profile-1.5-1
- Libfabric-aws: libfabric-aws-1.16.0-1
- Rdma-core: rdma-core-41.0-2
- Open MPI: openmpi40-aws-4.1.4-3
Upgrade Python runtime used by Lambda functions in AWS Batch integration to python3.9.

BUG FIXES

Prevent cluster tags to be changed during an update because not supported.

Assets 2

16 Nov 13:54

eantonin

v3.1.5

1ae80ac

AWS ParallelCluster v3.1.5

We're excited to announce the release of AWS ParallelCluster 3.1.5

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

CHANGES

Upgrade EFA installer to 1.18.0
- Efa-driver: efa-1.16.0-1
- Efa-config: efa-config-1.11-1
- Efa-profile: efa-profile-1.5-1
- Libfabric-aws: libfabric-aws-1.16.0~amzn4.0-1
- Rdma-core: rdma-core-41.0-2
- Open MPI: openmpi40-aws-4.1.4-2
Add lambda:ListTags and lambda:UntagResource to ParallelClusterUserRole used by ParallelCluster API stack for cluster update.
Upgrade Intel MPI Library to 2021.6.0.602.
Upgrade NVIDIA driver to version 470.141.03.
Upgrade NVIDIA Fabric Manager to version 470.141.03.

BUG FIXES

Fix Slurm issue that prevents idle nodes termination.

Assets 2

02 Nov 15:06

gmarciani

v3.3.0

c967c0c

AWS ParallelCluster v3.3.0

We're excited to announce the release of AWS ParallelCluster 3.3.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

Add possibility to specify multiple EC2 instance types for the same compute resource.
Add support for adding and removing shared storages at cluster update by updating SharedStorage configuration.
Add new configuration parameter DeletionPolicy for EFS and FSx for Lustre shared storage to support storage retention.
Add new configuration section Scheduling/SlurmSettings/Database to enable accounting functionality in Slurm.
Add support for On-Demand Capacity Reservations and Capacity Reservations Resource Groups.
Add new configuration parameter in Imds/ImdsSettings to specify the IMDS version to support in a cluster or build image infrastructure.
Add support for Networking/PlacementGroup in the SlurmQueues/ComputeResources section.
Add support for instances with multiple network interfaces that allows only one ENI per device.
Improve validation of networking for external EFS file systems by checking the CIDR block in the attached security group.
Add validator to check if configured instance types support placement groups.
Configure NFS threads to be min(256, max(8, num_cores * 4)) to ensure better stability and performance.
Move NFS installation at build time to reduce configuration time.
Enable server-side encryption for the EcrImageBuilder SNS topic created when deploying ParallelCluster API and used to notify on docker image build events.

CHANGES

Change behaviour of SlurmQueues/Networking/PlacementGroup/Enabled: now it creates a different managed placement
group for each compute resource instead of a single managed placement group for all compute resources.
Add support for PlacementGroup/Name as the preferred naming method.
Move head node tags from Launch Template to instance definition to avoid head node replacement on tags updates.
Disable Multithreading through script executed by cloud-init and not through CpuOptions set into Launch Template.
Upgrade Python to version 3.9 and NodeJS to version 16 in API infrastructure, API Docker container and cluster Lambda resources.
Remove support for Python 3.6 in aws-parallelcluster-batch-cli.
Upgrade Slurm to version 22.05.5.
Upgrade NVIDIA driver to version 470.141.03.
Upgrade NVIDIA Fabric Manager to version 470.141.03.
Upgrade NVIDIA CUDA Toolkit to version 11.7.1.
Upgrade Python used in ParallelCluster virtualenvs from 3.7.13 to 3.9.15.
Upgrade Slurm to version 22.05.5.
Upgrade EFA installer to version 1.18.0.
Upgrade NICE DCV to version 2022.1-13300.
Allow for suppressing the SingleSubnetValidator for Queues.

BUG FIXES

Fix validation of filters parameter in ListClusterLogStreams command to fail when incorrect filters are passed.
Fix validation of parameter SharedStorage/EfsSettings: now validation fails when FileSystemId is specified
along with other SharedStorage/EfsSettings parameters, whereas it was previously ignoring them.
Fix cluster update when changing the order of SharedStorage together with other changes in the configuration.
Fix UpdateParallelClusterLambdaRole in the ParallelCluster API to upload logs to CloudWatch.
Fix Cinc not using the local CA certificates bundle when installing packages before any cookbooks are executed.
Fix a hang in upgrading ubuntu via pcluster build-image when Build:UpdateOsPackages:Enabled:true is set.
Fix parsing of YAML cluster configuration by failing on duplicate keys.

Assets 2

03 Oct 08:59

francesco-giordano

v3.2.1

87a70c4

AWS ParallelCluster v3.2.1

We're excited to announce the release of AWS ParallelCluster 3.2.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

Improve the logic to associate the host routing tables to the different network cards to better support EC2 instances with several NICs.

CHANGES

Upgrade NVIDIA driver to version 470.141.03.
Upgrade NVIDIA Fabric Manager to version 470.141.03.
Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.
Upgrade Intel MPI Library to 2021.6.0.602.
Upgrade Python from 3.7.10 to 3.7.13 in response to this security risk.

BUG FIXES

Avoid failing on DescribeCluster when cluster configuration is not available.

Assets 2

27 Jul 17:48

gmarciani

v3.2.0

fdc0dfd

AWS ParallelCluster v3.2.0

We're excited to announce the release of AWS ParallelCluster 3.2.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

Add support for memory-based job scheduling in Slurm
- Configure compute nodes real memory in the Slurm cluster configuration.
- Add new configuration parameter Scheduling/SlurmSettings/EnableMemoryBasedScheduling to enable memory-based scheduling in Slurm.
- Add new configuration parameter Scheduling/SlurmQueues/ComputeResources/SchedulableMemory to override default value of the memory seen by the scheduler on compute nodes.
Improve flexibility on cluster configuration updates to avoid the stop and start of the entire cluster whenever possible.
- Add new configuration parameter Scheduling/SlurmSettings/QueueUpdateStrategy to set the preferred strategy to adopt for compute nodes needing a configuration update and replacement.
Improve failover mechanism over available compute resources when hitting insufficient capacity issues with EC2 instances. Disable compute nodes by a configurable amount of time (default 10 min) when a node launch fails due to insufficient capacity.
Add support to mount existing FSx for ONTAP and FSx for OpenZFS file systems.
Add support to mount multiple instances of existing EFS, FSx for Lustre / for ONTAP/ for OpenZFS file systems.
Add support for FSx for Lustre Persistent_2 deployment type when creating a new file system.
Prompt user to enable EFA for supported instance types when using pcluster configure wizard.
Add support for rebooting compute nodes via Slurm.
Improved handling of Slurm power states to also account for manual powering down of nodes.
Add NVIDIA GDRCopy 2.3 into the product AMIs to enable low-latency GPU memory copy.

CHANGES

Upgrade EFA installer to version 1.17.2
- EFA driver: efa-1.16.0-1
- EFA configuration: efa-config-1.10-1
- EFA profile: efa-profile-1.5-1
- Libfabric: libfabric-aws-1.16.0~amzn2.0-1
- RDMA core: rdma-core-41.0-2
- Open MPI: openmpi40-aws-4.1.4-2
Upgrade NICE DCV to version 2022.0-12760.
Upgrade NVIDIA driver to version 470.129.06.
Upgrade NVIDIA Fabric Manager to version 470.129.06.
Change default EBS volume types from gp2 to gp3 for both the root and additional volumes.
Changes to FSx for Lustre file systems created by ParallelCluster:
- Change the default deployment type to Scratch_2.
- Change the Lustre server version to 2.12.
Do not require PlacementGroup/Enabled to be set to true when passing an existing PlacementGroup/Id.
Add parallelcluster:cluster-name tag to all the resources created by ParallelCluster.
Do not allow setting PlacementGroup/Id when PlacementGroup/Enabled is explicitly set to false.
Add lambda:ListTags and lambda:UntagResource to ParallelClusterUserRole used by ParallelCluster API stack for cluster update.
Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter HeadNode/Imds/Secured is true as by default.
With a custom AMI, use the AMI root volume size instead of the ParallelCluster default of 35 GiB. The value can be changed in cluster configuration file.
Automatic disabling of the compute fleet when the configuration parameter Scheduling/SlurmQueues/ComputeResources/SpotPrice
is lower than the minimum required Spot request fulfillment price.
Show requested_value and current_value values in the change set when adding or removing a section during an update.
Disable aws-ubuntu-eni-helper service in DLAMI to avoid conflicts with configure_nw_interface.sh when configuring instances with multiple network cards.
Remove support for Python 3.6.
Set MTU to 9001 for all the network interfaces when configuring instances with multiple network cards.
Remove the trailing dot when configuring the compute node FQDN.

BUG FIXES

Fix the default behavior to skip the ParallelCluster validation and test steps when building a custom AMI.
Fix file handle leak in computemgtd.
Fix race condition that was sporadically causing launched instances to be immediately terminated because not available yet in EC2 DescribeInstances response
Fix support for DisableSimultaneousMultithreading parameter on instance types with Arm processors.
Fix ParallelCluster API stack update failure when upgrading from a previus version. Add resource pattern used for the ListImagePipelineImages action in the EcrImageDeletionLambdaRole.
Fix ParallelCluster API adding missing permissions needed to import/export from S3 when creating an FSx for Lustre storage.

Assets 2

13 May 16:46

francesco-giordano

v2.11.7

603636d

AWS ParallelCluster v2.11.7

We're excited to announce the release of AWS ParallelCluster 2.11.7

Upgrade

How to upgrade?

sudo pip install aws-parallelcluster==2.11.7

CHANGES

Upgrade Slurm to version 20.11.9.

Assets 2

16 May 19:57

chenwany

v3.1.4

320870d

AWS ParallelCluster v3.1.4

We're excited to announce the release of AWS ParallelCluster 3.1.4

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

Add validation for DirectoryService/PasswordSecretArn to fail in case the secret does not exist.

CHANGES

Upgrade Slurm to version 21.08.8-2.
Build Slurm with JWT support.
Do not require PlacementGroup/Enabled to be set to true when passing an existing PlacementGroup/Id.
Add lambda:TagsResource to ParallelClusterUserRole used by ParallelCluster API stack for cluster creation and image creation.

BUG FIXES

Fix the ability to export cluster's logs when using export-cluster-logs command with the --filters option.
Fix AWS Batch Docker entrypoint to use /home shared directory to coordinate Multi-node-Parallel job execution.

Assets 2

19 Apr 13:27

gmarciani

v2.11.6

04df25c

AWS ParallelCluster v2.11.6

We're excited to announce the release of AWS ParallelCluster 2.11.6

Upgrade

How to upgrade?

sudo pip install aws-parallelcluster==2.11.6

ENHANCEMENTS

Improve exception management in case of missing networking.

CHANGES

OS package updates and security fixes.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade

Upgrade

Upgrade

Upgrade

Upgrade

Upgrade

Upgrade

Upgrade

Upgrade

Upgrade

Releases: aws/aws-parallelcluster

AWS ParallelCluster v2.11.9

Upgrade

AWS ParallelCluster v3.3.1

Upgrade

AWS ParallelCluster v2.11.8

Upgrade

AWS ParallelCluster v3.1.5

Upgrade

AWS ParallelCluster v3.3.0

Upgrade

AWS ParallelCluster v3.2.1

Upgrade

AWS ParallelCluster v3.2.0

Upgrade

AWS ParallelCluster v2.11.7

Upgrade

AWS ParallelCluster v3.1.4

Upgrade

AWS ParallelCluster v2.11.6

Upgrade