Skip to content

Commit afc9750

Browse files
committed
Switch to H100 nodes for training
1 parent 4f7fcbe commit afc9750

File tree

3 files changed

+3
-3
lines changed

3 files changed

+3
-3
lines changed

scripts/esmfold_prior_tiered_training.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#SBATCH --partition chengji-lab-gpu # NOTE: use reserved partition `chengji-lab-gpu` to use reserved A100 or H100 GPUs
44
#SBATCH --account chengji-lab # NOTE: this must be specified to use the reserved partition above
55
#SBATCH --nodes=1 # NOTE: this needs to match Lightning's `Trainer(num_nodes=...)`
6-
#SBATCH --gres gpu:A100:4 # request A100 GPU resource(s)
6+
#SBATCH --gres gpu:H100:4 # request H100 GPU resource(s)
77
#SBATCH --ntasks-per-node=4 # NOTE: this needs to be `1` on SLURM clusters when using Lightning's `ddp_spawn` strategy`; otherwise, set to match Lightning's quantity of `Trainer(devices=...)`
88
#SBATCH --mem=0 # NOTE: use `--mem=0` to request all memory "available" on the assigned node
99
#SBATCH -t 7-00:00:00 # time limit for the job (up to 7 days: `7-00:00:00`)

scripts/esmfold_prior_training.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#SBATCH --partition chengji-lab-gpu # NOTE: use reserved partition `chengji-lab-gpu` to use reserved A100 or H100 GPUs
44
#SBATCH --account chengji-lab # NOTE: this must be specified to use the reserved partition above
55
#SBATCH --nodes=1 # NOTE: this needs to match Lightning's `Trainer(num_nodes=...)`
6-
#SBATCH --gres gpu:A100:4 # request A100 GPU resource(s)
6+
#SBATCH --gres gpu:H100:4 # request H100 GPU resource(s)
77
#SBATCH --ntasks-per-node=4 # NOTE: this needs to be `1` on SLURM clusters when using Lightning's `ddp_spawn` strategy`; otherwise, set to match Lightning's quantity of `Trainer(devices=...)`
88
#SBATCH --mem=0 # NOTE: use `--mem=0` to request all memory "available" on the assigned node
99
#SBATCH -t 7-00:00:00 # time limit for the job (up to 7 days: `7-00:00:00`)

scripts/harmonic_prior_training.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#SBATCH --partition chengji-lab-gpu # NOTE: use reserved partition `chengji-lab-gpu` to use reserved A100 or H100 GPUs
44
#SBATCH --account chengji-lab # NOTE: this must be specified to use the reserved partition above
55
#SBATCH --nodes=1 # NOTE: this needs to match Lightning's `Trainer(num_nodes=...)`
6-
#SBATCH --gres gpu:A100:4 # request A100 GPU resource(s)
6+
#SBATCH --gres gpu:H100:4 # request H100 GPU resource(s)
77
#SBATCH --ntasks-per-node=4 # NOTE: this needs to be `1` on SLURM clusters when using Lightning's `ddp_spawn` strategy`; otherwise, set to match Lightning's quantity of `Trainer(devices=...)`
88
#SBATCH --mem=0 # NOTE: use `--mem=0` to request all memory "available" on the assigned node
99
#SBATCH -t 7-00:00:00 # time limit for the job (up to 7 days: `7-00:00:00`)

0 commit comments

Comments
 (0)