Running Jobs – The Batch System (Slurm)
The instructions in this section are for users who have been asked to test applications in the Slurm batch system on the upgraded CSF3, thereby helping us to ensure the upgraded system is running as expected.
(answering these requests will slow down our upgrade work!)
Why use a batch system?
All jobs must be run in the batch system (Slurm). This allows you to specify the resources (cores, memory, GPUs) you need for your jobs and ensures the jobs only run when those resources become available.
It also ensures fair usage of the system – there are many jobs making different demands of the system and many users submitting jobs. The batch system will schedule your jobs according to resources requested and size of your group’s contribution to the system.
Be kind to the login nodes and other users
Applications should not be run directly on the login nodes. These are relatively small, light-weight nodes (not many cores, small memory) used to access the system, edit files, submit jobs. Many users will be connected to the login nodes. If you run an application there, you may prevent all of those users from doing their work.
Check the documentation of your application
Please ensure you know how to run your applications correctly (e.g., using the correct number of CPUs and GPUs.) If using an app that is already installed on the CSF, see our Application Documentation – we provide example jobscripts.
Do not log in to compute nodes
Logging in directly to the compute nodes is not permitted. Interactive jobs can be used to work at the command-line on a compute node (e.g, to perform quick test runs, trying different parameters, possibly modifying and recompiling code.)
Batch Tutorial
If you are unfamiliar with running jobs in a batch system please see our 10 minute tutorial on running jobs on the CSF.
Submitting Jobs and Requesting Resources
You will need to write a small jobscript,
# Several editor apps are available. For example: xnedit myjobscript # or gedit myjobscript
which is a simple text file that specifies
- Any additional or specific resources your job needs (number of CPU cores, the architecture/type of CPU, memory, GPUs).
- The actual commands / application your job should execute.
Further details on how to write jobscripts, and some example job scripts, are in the sections on serial jobs and parallel jobs. The menu on the left also has pages for more advanced job options. Our software pages also have example jobscripts for each application we have installed.
Then submit the jobscript to the batch system using
sbatch myjobscript
You may also wish to check on your job (is it still running?) using
squeue
See the batch commands for more information.