Our TANGO HPC Cluster comprises nodes from two locations, allowing us to provide dynamic resources to our researchers.
Scratch space provides the fastest and most accessible storage for high-performance computing jobs. Scratch space is dedicated for only temporary storage and should not be used as a filing or backup space. To provide the best job performance, each node has its own scratch space, with TANGO using VSAN and our big memory nodes (i.e. with greater than 256 GB RAM) using the Lustre filesystem. This means moving files to scratch should be handled inside the tango-scratch.sub SLURM job submission script.
You will find the SLURM submission script template in your home directory under a folder called .templates :
ls -lar .templates/
-rw-r--r-- 1 Owner-Acct Group-Name 526 Aug 14 10:57 tango.sub
-rw-r--r-- 1 Owner-Acct Group-Name 1011 Sep 14 13:05 tango-scratch.sub
cp ~/.templates/tango-scratch.sub myExperiment-scratch.sub
### Job Name
### Set email type for job
### Accepted options: NONE, BEGIN, END, FAIL, ALL
### email address for user
### Queue name that job is submitted to
### Request nodes
echo Running on host `hostname`
echo Time is `date`
# Copy job directory to scratch
mkdir -p /scratch/$USER/job-$SLURM_JOB_ID
rsync -avH --exclude=slurm-\*.out $SLURM_SUBMIT_DIR/ /scratch/$USER/job-$SLURM_JOB_ID/
# Go to the scratch directory to run the job from there
#Load module(s) if required
module load application_module
# Run the executable
# Copy job directory back to original directory and clean up scratch directory
rsync -avH --exclude=slurm-\*.out /scratch/$USER/job-$SLURM_JOB_ID/ $SLURM_SUBMIT_DIR/
rm -rf /scratch/$USER/job-$SLURM_JOB_ID
Edit the highlighted jobscript entries as required for your specific job:
- All lines beginning with #SBATCH are interpreted as SLURM commands directly to the queuing system;
MyJobName should be a concise but identifiable alphanumeric name for the job (starting with a letter, NOT a number);
- ntasks=X requests the number of CPUs required for a job;
mem=Xgb states that the program will use at most X GB of memory;
- time=HH:MM:SS states the amount of "hours:minutes:seconds" walltime (realised actual time) that your job will require at most. Please contact the Service Desk if you need more than 200 hours for your job;
module load is required if you don’t automatically load the required module(s) (e.g. application or compiler) in this shell’s environment. Edit the module(s) name(s) at application_module ; and
- MyProgram+Arguments is the name of the program you want to run and all of the command line arguments you need. It may also include redirection of input and output streams.
Output and error messages will be joined into a file slurm-XXXXX.out which is placed in the directory from which the job was submitted (XXXXX will be the numerical Job ID which is allocated when you submit the job with sbatch).
If parallel jobs (i.e. MPI jobs) are utilising multiple nodes, they must be confined to one physical data centre "dc" to ensure the use of the same local scratch space. In such instances, the same data centre must also be specified when submitting the job using dc:pl or dc:ep
sbatch -C "dc:pl" my_job.sub
will connect to the lustre scratch,
sbatch -C "dc:ep" my_job.sub
connects to VSAN.