|
| 1 | +# Batch Computing |
| 2 | + |
| 3 | +The academic clusters that we have access to mostly have `apptainer` installed which we can use to run the images with ldmx-sw built into them. |
| 4 | +We use `denv` when running the images manually and, fortunately, it is small enough to deploy onto the clusters as well.[^1] |
| 5 | +```shell |
| 6 | +# on the cluster you want to run batch jobs |
| 7 | +curl -s https://tomeichlersmith.github.io/denv/install | sh |
| 8 | +``` |
| 9 | + |
| 10 | +~~~admonish tip title="Image Storage" |
| 11 | +While the `${HOME}` directory is large enough to hold the installation of `denv`, |
| 12 | +they are usually much too small to hold copies of the images that we want to run. |
| 13 | +For this reason, you will likely want to edit your shell configuration (e.g. `~/.bashrc`) |
| 14 | +to change where `apptainer` will store the images. |
| 15 | +Refer to your cluster's IT help or documentation to find a suitable place to hold these images. |
| 16 | +For example, [the S3DF cluster at SLAC](https://s3df.slac.stanford.edu/#/reference?id=apptainer) |
| 17 | +suggests using the `${SCRATCH}` variable they define for their users. |
| 18 | +```shell |
| 19 | +export APPTAINER_LOCALCACHEDIR=${SCRATCH}/.apptainer |
| 20 | +export APPTAINER_CACHEDIR=${SCRATCH}/.apptainer |
| 21 | +export APPTAINER_TMPDIR=${SCRATCH}/.apptainer |
| 22 | +``` |
| 23 | +~~~ |
| 24 | + |
| 25 | +~~~admonish success title="Test" |
| 26 | +With `denv` installed on the cluster, you should be able to run `denv` like normal manually. |
| 27 | +For example, you can test run a light image that is fast to download. |
| 28 | +``` |
| 29 | +denv init alpine:latest |
| 30 | +denv cat /etc/os-release |
| 31 | +# should say "Alpine" instead of the host OS |
| 32 | +``` |
| 33 | +~~~ |
| 34 | + |
| 35 | +[^1]: The total disk footprint of a `denv` installation is 120KB. |
| 36 | +This is plenty small enough to include in your `${HOME}` directory on most if not all clusters. |
| 37 | +Additionally, most clusters share your `${HOME}` directory with the working nodes and so you don't even need to bother copying `denv` to where the jobs are being run. |
| 38 | + |
| 39 | +## Preparing for Batch Running |
| 40 | +The above instructions have you setup to run `denv` on the cluster just like you run `denv` on your own computer; |
| 41 | +however, doing a few more steps is helpful to ensure that the batch jobs run reliably and efficiently. |
| 42 | + |
| 43 | +### Pre-Building SIF Images |
| 44 | +Under-the-hood, `apptainer` runs images from SIF files. |
| 45 | +When `denv` runs using the image tage (e.g. `ldmx/pro:v4.2.3`), `apptainer` stores a copy of this image in a SIF file inside of the cache directory. |
| 46 | +While the cache directory is distributed across the worker nodes on some clusters, it is not distributed on all clusters, so pre-building the image ourselves |
| 47 | +into a known location is helpful. |
| 48 | + |
| 49 | +The location for the image should be big enough to hold the multi-GB image (so probably not your `${HOME}` directory) _and_ needs to be shared with the computers that run the jobs. |
| 50 | +Again, check with your IT or cluster documentation to see a precise location. |
| 51 | +At SLAC's S3DF, `/sdf/group/ldmx` can be a good location (and may already have the image you need built!). |
| 52 | +``` |
| 53 | +cd path/to/big/dir |
| 54 | +apptainer build ldmx_pro_v4.2.3.sif docker://ldmx/pro:v4.2.3 # just an example, name the SIF file appropriately |
| 55 | +``` |
| 56 | + |
| 57 | +## Running the SIF Image |
| 58 | +How we run the image during the jobs depends on how the jobs are configured. |
| 59 | +For the clusters I have access to (UMN and SLAC), there are two different ways for jobs to be configured |
| 60 | +that mainly change _where_ the job is run. |
| 61 | + |
| 62 | +~~~admonish success title="Check Where Jobs are Run" |
| 63 | +A good way to figure this out (and learn about the batch job system that you want to use) |
| 64 | +is to figure out how to run a job that just runs `pwd`. |
| 65 | +This command prints out the "present working directory" and so you can see where |
| 66 | +the job is being run from. |
| 67 | +
|
| 68 | +Refer to your cluster's IT, documentation, and the batch job system's documentation to |
| 69 | +learn how to do this. |
| 70 | +~~~ |
| 71 | + |
| 72 | +#### Jobs Run In Submitted Directory |
| 73 | +At SLAC S3DF, the jobs submitted with `sbatch` are run from the directory where `sbatch` was run. |
| 74 | +This makes it rather easy to run jobs. |
| 75 | +We can create a denv and then submit a job running `denv` from within that directory. |
| 76 | +``` |
| 77 | +cd batch/submit/dir |
| 78 | +denv init /full/path/to/big/dir/ldmx_pro_v4.2.3.sif |
| 79 | +``` |
| 80 | + |
| 81 | +For example, submitting jobs for a range of run numbers would look like |
| 82 | +```shell |
| 83 | +mkdir log # the SBATCH commands in submit put the log files here |
| 84 | +sbatch --array=0-10 submit.sh |
| 85 | +``` |
| 86 | +with |
| 87 | +```bash |
| 88 | +#!/bin/bash |
| 89 | +#SBATCH --job-name my-job |
| 90 | +#SBATCH --cpus-per-task=1 |
| 91 | +#SBATCH --mem-per-cpu=2g |
| 92 | +#SBATCH --time=04:00:00 # time limit for jobs |
| 93 | +#SBATCH --output=log/%A-%a.log |
| 94 | +#SBATCH --error=log/%A-%a.log |
| 95 | + |
| 96 | +set -o errexit |
| 97 | +set -o nounset |
| 98 | + |
| 99 | +# assume the configuration script config.py takes one argument |
| 100 | +# the run number it should use for the simulation |
| 101 | +# and then uniquely creates the path of the output file here |
| 102 | +denv fire config.py ${SLURM_ARRAY_TASK_ID} |
| 103 | +# fire is run inside ldmx/pro:v4.2.3 IF SUBMITTED FROM batch/submit/dir |
| 104 | +``` |
| 105 | +Look at the SLAC S3DF and Slurm documentation to learn more about configuring the batch jobs themselves. |
| 106 | + |
| 107 | +~~~admonish comments title="Comments" |
| 108 | +- _Technically_, since SLAC S3DF's `${SCRATCH}` directory is also shared across the worker nodes, you do not need to pre-build the image. However, this is not advised because if the `${SCRATCH}` directory is periodically cleaned during your jobs, the cached SIF image would be lost and your jobs could fail in confusing ways. |
| 109 | +- Some clusters configure Slurm to limit the number of jobs you can submit at once with `--array`. This means you might need to submit the jobs in "chunks" and add an offset to `SLURM_ARRAY_TASK_ID` so that the different "chunks" have different run numbers. This can be done with bash's math syntax e.g. `$(( SLURM_ARRAY_TASK_ID + 100 ))`. |
| 110 | +~~~ |
| 111 | + |
| 112 | +#### Jobs Run in Scratch Directory |
| 113 | +At UMN's CMS cluster, the jobs submitted with `condor_submit` are run from a newly-created scratch directory. |
| 114 | +This makes it slightly difficult to inform `denv` of the configuration we want to use. |
| 115 | +`denv` has an experimental shebang syntax that could be helpful for this purpose. |
| 116 | + |
| 117 | +`prod.sh` |
| 118 | +```bash |
| 119 | +#!/full/path/to/denv shebang |
| 120 | +#!denv_image=/full/path/to/ldmx_pro_v4.2.3.sif |
| 121 | +#!bash |
| 122 | + |
| 123 | +set -o nounset |
| 124 | +set -o errexit |
| 125 | + |
| 126 | +# everything here is run in `bash` inside ldmx/pro:v4.2.3 |
| 127 | +# assume run number is provided as an argument |
| 128 | +fire config.py ${1} |
| 129 | +``` |
| 130 | + |
| 131 | +with the submit file `submit.sub` in the same directory. |
| 132 | +``` |
| 133 | +# run prod.sh and transfer it to scratch area |
| 134 | +executable = prod.sh |
| 135 | +transfer_executable = yes |
| 136 | +
|
| 137 | +# terminal and condor output log files |
| 138 | +# helpful for debugging at slight performance cost |
| 139 | +output = logs/$(run_number)-$(Cluster)-$(Process).out |
| 140 | +error = $(output) |
| 141 | +log = $(Cluster)-condor.log |
| 142 | +
|
| 143 | +# "hold" the job if there is a non-zero exit code |
| 144 | +# and store the exit code in the hold reason subcode |
| 145 | +on_exit_hold = ExitCode != 0 |
| 146 | +on_exit_hold_subcode = ExitCode |
| 147 | +on_exit_hold_reason = "Program exited with non-zero exit code" |
| 148 | +
|
| 149 | +# the 'Process' variable is an index for the job in the submission cluster |
| 150 | +arguments = "$(Process)" |
| 151 | +``` |
| 152 | +And then you would `condor_submit` this script with |
| 153 | +```shell |
| 154 | +condor_submit submit.sub --queue 10 |
| 155 | +``` |
| 156 | + |
| 157 | +~~~admonish note collapsible=true title="Alternative Script Design" |
| 158 | +Alternatively, one could write a script _around_ `denv` like |
| 159 | +```shell |
| 160 | +#!/bin/bash |
| 161 | +
|
| 162 | +set -o nounset |
| 163 | +set -o errexit |
| 164 | +
|
| 165 | +# stuff here is run outside ldmx/pro:v4.2.3 |
| 166 | +# need to call `denv` to go into image |
| 167 | +denv init /full/path/to/ldmx_pro_v4.2.3.sif |
| 168 | +denv fire config.py ${1} |
| 169 | +``` |
| 170 | +The `denv init` call writes a few small files which shouldn't have a large impact on performance |
| 171 | +(but could if the directory in which the job is being run has a slow filesystem). |
| 172 | +This is helpful if your configuration of HTCondor does not do the file transfer for you and |
| 173 | +your job is responsible for copying in/out any input/output files that are necessary. |
| 174 | +~~~ |
| 175 | + |
| 176 | +~~~admonish note title="Comments" |
| 177 | +- Similar to Slurm's `--array`, we are relying on HTCondor's `-queue` command to decide what run numbers to use. Look at HTCondor's documentation (for example [Submitting many similar jobs with one queue command](https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#submitting-many-similar-jobs-with-one-queue-command)) for more information. |
| 178 | +~~~ |
0 commit comments