Skip to content

Commit 66ee6d6

Browse files
first draft of batch.md
1 parent 0abe0bc commit 66ee6d6

File tree

1 file changed

+100
-0
lines changed

1 file changed

+100
-0
lines changed

src/using/batch.md

+100
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Batch Computing
2+
3+
The academic clusters that we have access to mostly have `apptainer` installed which we can use to run the images with ldmx-sw built into them.
4+
We use `denv` when running the images manually and, fortunately, it is small enough to deploy onto the clusters as well.[^1]
5+
```shell
6+
# on the cluster you want to run batch jobs
7+
curl -s https://tomeichlersmith.github.io/denv/install | sh
8+
```
9+
10+
~~~admonish tip title="Image Storage"
11+
While the `${HOME}` directory is large enough to hold the installation of `denv`,
12+
they are usually much too small to hold copies of the images that we want to run.
13+
For this reason, you will likely want to edit your shell configuration (e.g. `~/.bashrc`)
14+
to change where `apptainer` will store the images.
15+
Refer to your cluster's IT help or documentation to find a suitable place to hold these images.
16+
For example, [the S3DF cluster at SLAC](https://s3df.slac.stanford.edu/#/reference?id=apptainer)
17+
suggests using the `${SCRATCH}` variable they define for their users.
18+
```shell
19+
export APPTAINER_LOCALCACHEDIR=${SCRATCH}/.apptainer
20+
export APPTAINER_CACHEDIR=${SCRATCH}/.apptainer
21+
export APPTAINER_TMPDIR=${SCRATCH}/.apptainer
22+
```
23+
~~~
24+
25+
~~~admonish success title="Test"
26+
With `denv` installed on the cluster, you should be able to run `denv` like normal manually.
27+
For example, you can test run a light image that is fast to download.
28+
```
29+
denv init alpine:latest
30+
denv cat /etc/os-release
31+
# should say "Alpine" instead of the host OS
32+
```
33+
~~~
34+
35+
[^1]: The total disk footprint of a `denv` installation is 120KB.
36+
This is plenty small enough to include in your `${HOME}` directory on most if not all clusters.
37+
Additionally, most clusters share your `${HOME}` directory with the working nodes and so you don't even need to bother copying `denv` to where the jobs are being run.
38+
39+
## Preparing for Batch Running
40+
The above instructions have you setup to run `denv` on the cluster just like you run `denv` on your own computer; however,
41+
doing a few more steps is helpful to ensure that the batch jobs run reliably and efficiently.
42+
43+
### Pre-Building SIF Images
44+
Under-the-hood, `apptainer` runs images from SIF files.
45+
When `denv` runs using the image tage (e.g. `ldmx/pro:v4.2.3`), `apptainer` stores a copy of this image in a SIF file inside of the cache directory.
46+
While the cache directory is distributed across the worker nodes on some clusters, it is not distributed on all clusters, so pre-building the image ourselves
47+
into a known location is helpful.
48+
49+
The location for the image should be big enough to hold the multi-GB image (so probably not your `${HOME}` directory) _and_ needs to be shared with the computers that run the jobs.
50+
Again, check with your IT or cluster documentation to see a precise location.
51+
At SLAC's S3DF, `/sdf/group/ldmx` can be a good location (and may already have the image you need built!).
52+
```
53+
cd path/to/big/dir
54+
apptainer build ldmx_pro_v4.2.3.sif docker://ldmx/pro:v4.2.3 # just an example, name the SIF file appropriately
55+
```
56+
57+
### Running the SIF Image
58+
How we run the image during the jobs depends on how the jobs are configured.
59+
For the clusters I have access to (UMN and SLAC), there are two different ways for jobs to be configured
60+
that mainly change _where_ the job is run.
61+
62+
#### Jobs Run In Submitted Directory
63+
At SLAC S3DF, the jobs submitted with `sbatch` are run from the directory where `sbatch` was run.
64+
This makes it rather easy to run jobs.
65+
We can create a denv and then submit a job running `denv` from within that directory.
66+
```
67+
cd batch/submit/dir
68+
denv init /full/path/to/big/dir/ldmx_pro_v4.2.3.sif
69+
```
70+
Submitting the job would look like `sbatch <job-options> submit.sh` with
71+
```shell
72+
# submit.sh
73+
denv fire config.py # inside ldmx/pro:v4.2.3 IF SUBMITTED FROM batch/submit/dir
74+
```
75+
Look at the SLAC S3DF and Slurm documentation to learn more about configuring the batch jobs themselves.
76+
77+
#### Jobs Run in Scratch Directory
78+
At UMN's CMS cluster, the jobs submitted with `condor_submit` are run from a newly-created scratch directory.
79+
This makes it slightly difficult to inform `denv` of the configuration we want to use.
80+
`denv` has an experimental shebang syntax that could be helpful for this purpose.
81+
82+
```shell
83+
#!/usr/bin/env denv shebang
84+
#!denv_image=/full/path/to/ldmx_pro_v4.2.3.sif
85+
#!bash
86+
87+
# everything here is run in `bash` inside ldmx/pro:v4.2.3
88+
fire config.py
89+
```
90+
91+
And then you would `condor_submit` this script.
92+
Alternatively, one could write a script _around_ `denv` like
93+
```shell
94+
# stuff here is run outside ldmx/pro:v4.2.3
95+
# need to call `denv` to go into image
96+
denv init /full/path/to/ldmx_pro_v4.2.3.sif
97+
denv fire config
98+
```
99+
The `denv init` call writes a few small files which shouldn't have a large impact on performance
100+
(but could if the directory in which the job is being run has a slow filesystem).

0 commit comments

Comments
 (0)