Multiple Similar Jobs – Job Arrays
CSF Users Please do not run jobarrays in the short environment, even if your tasks have a short runtime. There are not enough cores in short for jobarrays. |
Why?
Suppose you wish to run a large number of almost identical jobs – for example you may wish to run the same program many times with different arguments or parameters; or perhaps process a thousand different input files with the same application. You may have used a Condor pool to do this (where idle PCs on campus are used to run your jobs overnight) but systems such as the CSF, Redqueen and the DPSF can also run these High Throughput Computing jobs.
The wrong way to do this would be to write a Perl or Python script to generate all the required qsub
jobscripts and then use a BASH script to submit them all (run qsub 1000s of times). This is not a good use of your time and it will do horrible things to the submit node (which manages the job queues) on a cluster. The sysadmins may will kill such jobs to keep the system running smoothly!
A much better way is to use an SGE Job Array. Simply put, a job array runs multiple copies (100s, 1000s, …) of your job in a way that places much less strain on the queue manager. You only write one jobscript and use qsub
once to submit that job. Your jobscript includes a flag to say how many copies of it should be run. Each copy of the job that the system runs is given a unique task id. You use the task id in your jobscript to have each task do some unique work (e.g., each task reads from a different file or uses a different set of input parameters).
Using the unique task id in your jobscript is the key to writing a good job array script. You can be creative here – the task id can be used in many ways. Below we describe how to submit an SGE job comprising numerous serial (single core) tasks and also SMP (multicore) tasks and give several examples of how to use the task id in your jobscript.
Job runtime
Each task in an array gets a maximum runtime of 7 days. Job arrays are not terminated at the 7 day limit, they will remain in the system until all tasks complete.
Please note: On the CSF: job-arrays are not permitted in the short
area. Please do not use short
as a means of jumping the queue.
Job Array Basics
An SGE array job might be described as a job with a for-loop built in. Here is a simple example:
#!/bin/bash #$ -cwd #$ -V # The 'myprog' below is serial hence no '-pe' option needed #$ -t 1-1000 # ...tell SGE that this is an array job, with "tasks" numbered from 1 # to 1000... ./myprog < data.$SGE_TASK_ID > results.$SGE_TASK_ID
Computationally, this is equivalent to 1000 individual queue submissions in which $SGE_TASK_ID
takes the values 1, 2, 3. . . 1000
, and where input and output files are indexed by the ID. The SGE_TASK_ID
variable is automatically set for you by the batch system when a particular task runs. Please note that for ”serial” jobs you don’t use a PE setting.
To submit the job simply issue one qsub
command:
qsub jobscript
where jobscript is the name of your script above.
Multi-core (SMP) tasks (e.g., OpenMP jobs) can also be run in jobarrays. Each task will run your program with the requested number of cores. Simply add a -pe
option to the jobscript and then tell your program how many cores it can use in the usual manner (see parallel job submission). Please be aware that each task will be requesting the specified resources (number of cores). It may take longer for each task to get through the batch queue, depending on how busy the system is.
An example SMP array job is given below:
#!/bin/bash #$ -cwd #$ -V #$ -pe smp.pe 4 # Each task will use 4 cores #$ -t 1-1000 # ...tell SGE that this is an array job, with "tasks" numbered from 1 # to 1000... # My OpenMP program will read this variable to get how many cores to use. # $NSLOTS is automatically set to the number specified on the -pe line above. export OMP_NUM_THREADS=$NSLOTS ./myOMPprog < data.$SGE_TASK_ID > results.$SGE_TASK_ID
Again, simply submit the job once using qsub jobscript
.
Job arrays have several advantages over submitting 100s or 1000s of individual jobs. In both of the above cases:
- Only one
qsub
command is issued (and only oneqdel
command would be required to delete all tasks). - The batch system will try to run many of your tasks at once (possibly hundreds simultaneously for serial job arrays, depending on what we have set the limit to be according to demand on the system). So you get a lot more than one task running in parallel from just one
qsub
command. The system will churn through your tasks running them all as cores become free on the system, at which point the job is finished. - Only one entry appears to be queued (qw) in the qstat output for the job array, but each individual task running (r) will be visible. This makes reading your
qstat
output a lot easier than if you’d submitted 1000s of individual jobs. - The load on the SGE submit node (i.e., the cluster node responsible for managing the queues and scheduling which jobs run) is vastly less than that of submitting 1000 separate jobs.
There are many ways to use the $SGE_TASK_ID
variable to supply a different input to each task and several examples are shown below.
A More General For Loop
It is not necessary that SGE_TASK_ID
starts at 1
; nor must the increment be 1. For example:
#$ -t 100-995:5
so that SGE_TASK_ID
takes the values 100, 105, 110, 115... 995
. However, the SGE_TASK_ID
is not allowed to start at 0.
Incidently, in the case in which the upper-bound is not equal to the lower-bound plus an integer-multiple of the increment, for example
#$ -t 1-42:6
SGE automatically changes the upper bound, viz
prompt> qsub array.qsub Your job-array 2642.1-42:6 ("array.qsub") has been submitted prompt> qstat job-ID prior name user state submit/start at queue slots ja-task-ID ------------------------------------------------------------------------------------------- 2642 0.00000 array.qsub simonh qw 04/24/2014 12:29:29 1 1-37:6
Note the 1-37 in the ja-task-ID
column: the final task id has been adjusted to be 37 rather than 42. Hence the tasks ids used will be 1,7,13,19,25,31,37. Remember that the task id cannot start at zero so don’t be tempted to try #$ -t 0-42:6
.
Related Environment Variables
There are three more automatically created environment variables one can use, as illustrated by this simple qsub script:
#!/bin/bash #$ -cwd #$ -V #$ -t 1-37:6 echo "The ID increment is: $SGE_TASK_STEPSIZE" if [[ $SGE_TASK_ID == $SGE_TASK_FIRST ]]; then echo "first" elif [[ $SGE_TASK_ID == $SGE_TASK_LAST ]]; then echo "last" else echo "neither" fi
Note that the batch system will try to start your jobs in numerical order but there is no guarantee that they will finish in the same order — some tasks may take longer to run than others. So you cannot rely on the task with id $SGE_TASK_LAST
being the last task to finish. Hence do not try something like:
# DO NOT do this in your jobscript - we may not be the last task to finish if [[ $SGE_TASK_ID == $SGE_TASK_LAST ]]; then # Archive output files from all tasks (output.1, output.2, ...). tar czf ~/scratch/all-my-results.tgz output.* # # BAD: we may not be the last task to finish just because we are the last # BAD: task id. Hence we may miss some output files from other tasks that # BAD: are still running. fi
The correct way to do something like this (where the work carried out by a task is dependent on other tasks having finished) is to use a job dependency which uses two separate jobs and automatically runs the second job only when the first job has completely finished. You would generally only use the $SGE_TASK_FIRST
and $SGE_TASK_LAST
variables where you wanted those tasks to do something different but where they are still independent of the other tasks.
Examples
We now show example job scripts which use the job array environment variables in various ways. All of the examples below are serial jobs (each task uses only one core) but you could equally use multicore (smp) jobs if your code/executable supports multicore. You should adapt these examples to your own needs.
A List of Input Files
One can be sneaky — suppose we have a list of input files, rather than input files explicitly indexed by suffix:
#!/bin/bash #$ -cwd #$ -V #$ -t 1-42 # Task id 1 will read line 1 from my_file_list.txt # Task id 2 will read line 2 from my_file_list.txt # and so on... # Each line contains the name of an input file to used by 'myprog' INFILE=`awk "NR==$SGE_TASK_ID" my_file_list.txt` # # ...or used sed to print the n-th line of a file: # # INFILE=`sed -n "${SGE_TASK_ID}p" my_file_list.txt` # ./myprog < $INFILE
Bash Scripting and Arrays
Another way of passing different parameters to your application (e.g., to run the same simulation but with different input parameters) is to list all the parameters in a bash array and index in to the array. For example:
#!/bin/bash #$ -cwd #$ -V #$ -t 1-10 # A bash array of my 10 input parameters X_PARAM=( 3400 4500 9700 10020 20000 30000 40000 44400 50000 60910 ) # Bash arrays use zero-based indexing but you CAN'T use #$ -t 0-9 (0 is an invalid task id) INDEX=$((SGE_TASK_ID-1)) # Run the app with one of the parameters ./myprog -xflag ${X_PARAM[$INDEX]} > output.${INDEX}.log
Running from Different Directories (simple)
Here we run each task in a separate directory (folder) that we create when each task runs. We run two applications from the jobscript – the first outputs to a file, the second reads that file as input and outputs to another file (your own applications may do something completely different). We run 1000 tasks, numbered 1…1000.
#!/bin/bash #$ -cwd #$ -V #$ -t 1-1000 # Create a new directory for each task and go in to that directory mkdir myjob-$SGE_TASK_ID cd myjob-$SGE_TASK_ID # Each task runs the same executables stored in the parent directory ../myprog-a.exe > a.output ../myprog-b.exe < a.output > b.output
In the above example all tasks use the same input and output filenames (a.output
and b.output
). This is safe because each task runs in its own directory.
Running from Different Directories (intermediate)
Here we use one of the techniques from above – read the directories we want to run in from a file. Task 1 will read line 1, task 2 reads line 2 and so on.
We assume the file contains sub-directory names. For example, suppose we are currently working in a directory named ~/scratch/jobs/ (it is in our scratch directory). The subdirectories are named after some property such as:
s023nn/arun1206/ s023nn/arun1207/ s023nn/brun1208/ s023nx/brun1201/ s023nx/crun1731/ and so on - it doesn't really matter what the subdirectories are called
The jobscript reads a line from the above list and cd‘s in to that directory:
#!/bin/bash #$ -cwd # Run from where we ran qsub #$ -V #$ -t 1-500 # my_dir_list.txt has 500 lines # Task id 1 will read line 1 from my_dir_list.txt # Task id 2 will read line 2 from my_dir_list.txt # and so on... SUBDIR=`sed -n "${SGE_TASK_ID}p" my_dir_list.txt` # Go in to the subdirectory cd $SUBDIR # Run our code ./myprog < input.dat
The above script assumes that each subdirectory contains a file named input.dat
which we process.
Running from Different Directories (advanced)
This example runs the same code but from different directories. Here we expect each directory to contain an input file. You can name your directories (and subdirectories) appropriately to match your experiments. We use BASH scripting to index in to arrays giving the names of directories. This example requires some knowledge of BASH but it should be straight forward to modify for your own work.
In this example we have the following directory structure (use whatever names are suitable for your code)
- 3 top-level directories named:
Helium
,Neon
,Argon
- 2 mid-level directories named:
temperature
,pressure
- 4 bottom-level directories named:
test1
,test2
,test3
,test4
So the directory tree looks something like:
| +---Helium---+---temperature---+---test1 | | +---test2 | | +---test3 | | +---test4 | | | +------pressure---+---test1 | +---test2 | +---test3 | +---test4 | +-----Neon---+---temperature---+---test1 | | +---test2 | | +---test3 | | +---test4 | | | +------pressure---+---test1 | +---test2 | +---test3 | +---test4 | +----Argon---+---temperature---+---test1 | +---test2 | +---test3 | +---test4 | +------pressure---+---test1 +---test2 +---test3 +---test4
Hence we have 3*2*4=24 input files all named myinput.dat
in paths such as
$HOME/scratch/chemistry/Helium/temperature/test1/myinput.dat ... $HOME/scratch/chemistry/Helium/temperature/test4/myinput.dat $HOME/scratch/chemistry/Helium/pressure/test1/myinput.dat ... $HOME/scratch/chemistry/Neon/temperature/test1/myinput.dat ... $HOME/scratch/chemistry/Argon/pressure/test4/myinput.dat
The following jobscript will run the executable mycode.exe
in each path (so that we process all 24 input files). In this example the code is a serial code (hence no PE is specified).
#!/bin/bash #$ -cwd #$ -V # This creates a job array of 24 tasks numbered 1...24 (IDs can't start at zero) #$ -t 1-24 # Subdirectories will all have this common root (saves me some typing) BASE=$HOME/scratch/chemistry # Path to my executable EXE=$BASE/exe/mycode.exe # Arrays giving subdirectory names (note no commas - use spaces to separate) DIRS1=( Helium Neon Argon ) DIRS2=( temperature pressure ) DIRS3=( test1 test2 test3 test4 ) # BASH script to get length of arrays NUMDIRS1=${#DIRS1[@]} NUMDIRS2=${#DIRS2[@]} NUMDIRS3=${#DIRS3[@]} TOTAL=$[$NUMDIRS1 * $NUMDIRS2 * $NUMDIRS3 ] echo "Total runs: $TOTAL" # Remember that $SGE_TASK_ID will be 1, 2, 3, ... 24. # BASH array indexing starts from zero so decrment. TID=$[SGE_TASK_ID-1] # Create indices in to the above arrays of directory names. # The first id increments the slowest, then the middle index, and so on. IDX1=$[TID/$[NUMDIRS2*NUMDIRS3]] IDX2=$[(TID/$NUMDIRS3)%$NUMDIRS2] IDX3=$[TID%$NUMDIRS3] # Index in to the arrays of directory names to create a path JOBDIR=${DIRS1[$IDX1]}/${DIRS2[$IDX2]}/${DIRS3[$IDX3]} # Echo some info to the job output file echo "Running SGE_TASK_ID $SGE_TASK_ID in directory $BASE/$JOBDIR" # Finally run my executable from the correct directory cd $BASE/$JOBDIR $EXE < myinput.dat > myoutput.dat
You may not need three levels of subdirectories and you’ll want to edit the names (BASE, EXE, DIRS1, DIRS2, DIRS3) and change the number of tasks requested.
To submit your job simply use qsub myjobscript.sh
, i.e., you only submit a single jobscript.
Running MATLAB (lock file error)
If you wish to run compiled MATLAB code in a job array please see MATLAB job arrays (CSF documentation) for details of an extra environment variable needed to prevent a lock-file error. This is a problem in MATLAB when running many instances at the same time, which can occur if running from a job array.
Limit the number of tasks to be run at the same time
By default the batch system will try attempt to run as many tasks as possible concurrently. If you do not want this to happen you can limit how many tasks can be running at the same time with the -tc
option. For example to limit it to 5 tasks:
#$ -tc 5
Job Dependencies with Job Arrays
It is possible to make a job wait for an entire job array to complete or to make the tasks of a job array wait for the corresponding task of another job array. It is also possible to make the tasks within a job array wait for other tasks (although this is limited). Examples are now given.
In the following examples we name each job using the -N
flag to make the text more readable. This is optional. If you don’t name a job you should use the Job ID number when referring to previous jobs.
Wait for entire job array to finish
Suppose you want JobB
to wait until all tasks in JobA
have finished. JobB can be an ordinary job or another job array. But it will not run until all tasks in the job array JobA
have finished. This is useful where you need to do something with the results of all tasks from a job array. Using a job dependency is the correct way to ensure all tasks in a job array have completed (using the last task in a job array to do some extra processing is incorrect because not all tasks may have finished even if they have all started).
------------------- Time ---------------------> JobA.task1 -----> End JobA.task2 --------> End # JobA tasks can run in parallel ... JobA.taskN ----> End JobB -----> End # JobB won't start until all # of JobA's tasks have finished
Here is the jobscript for JobB
– we use the -hold_jid
flag to give the name of the job we should wait for.
#!/bin/bash #$ -cwd #$ -N JobB #$ -hold_jid JobA # We will wait for all of JobA's tasks to finish ./myapp.exe
Submit the jobs in the expected order and JobB will wait for JobA to finish.
qsub jobscript_a qsub jobscript_b
Wait for individual tasks to finish
Suppose you have two job arrays to run, both with the same number of tasks. You want task 1 from JobB to run after task 1 from JobA has finished. Similarly you want task 2 from JobB to run after task 2 from JobA has finished. And so on. This allows you to pipeline tasks but still have them run independently and in parallel with other tasks.
----------------------- Time -----------------------> JobA.task1 -----> End JobB.task1 ------> END JobA.task2 -------> End JobB.task2 -------> END # Tasks can run in parallel. JobA tasks ... # and JobB tasks form pipelines. JobA.taskN -----> End JobB.taskN ----> END
Use the -hold_jid_ad
to set up an array dependency. Here are the two jobscripts:
JobA’s jobscript:
#!/bin/bash #$ -cwd #$ -N JobA # We are JobA #$ -t 1-20 # 20 Tasks in this example ./myAapp.exe data.$SGE_TASK_ID.in > data.$SGE_TASK_ID.A_result
JobB’s jobscript
#!/bin/bash #$ -cwd #$ -N JobB # We are JobB #$ -t 1-20 # 20 Tasks in this example (must be same as JobA) #$ -hold_jid_ad JobA # JobB.task1 waits only for JobA.task1 to finish and so on... # (the _ad means array dependency) ./myBapp.exe data.$SGE_TASK_ID.A_result > data.$SGE_TASK_ID.B_result
Submit both jobs:
qsub jobscript_a qsub jobscript_b
Tasks wait within a job array
It is possible to make the tasks within a jobarray wait for earlier tasks within the same jobarray. This is generally not recommend because it removes one of the advantages of job arrays – the ability to run independent jobs in parallel so that you get your results sooner. However, it does provide a method of easily submitting a large number of jobs where job N must wait for job N-1 to finish. We use the -tc
flag to control the number of concurrent tasks that the job array can execute.
------------------------------- Time ---------------------------------> JobA.task1 -----> End JobA.task2 -----> End ... JobA.taskN -----> End
The -tc N
allows the job array to run N
tasks in the job array at the same time. Without the flag the batch system will attempt to run as many tasks as possible concurrently. If you set this to 1
you effectively make each task wait for the previous task to finish:
#!/bin/bash #$ -cwd #$ -t 1-10 #$ -tc 1 # Run only one task at a time ./myApp.exe data.$SGE_TASK_ID.in > data.$SGE_TASK_ID.out
Deleting Job Arrays
Deleting All Tasks
An entire job array can be deleted with one command. This will delete all tasks that are running and are yet to run from the batch system:
qdel 18305 # # replace 18305 with your own job id number
Deleting Specific Tasks
Alternatively, it is possible to delete specific tasks while leaving other tasks running or in the queue waiting to run. You may wish to change some input files for these tasks, for example, or it might be because specific tasks have crashed and you need to delete them and then resubmit them. Simply add the -t taskrange
flag to qdel
where taskrange gives the tasks to delete. You must always give the job id followed by the tasks. For example:
- To delete a single task (id 30) from a job (id 18205)
qdel 18205 -t 30
- To delete tasks 200-300 inclusive from a job (id 18205)
qdel 18205 -t 200-300
- To delete tasks 25,26,50,51,52,100 from a job (id 18205)
qdel 18205 -t 25-26 qdel 18205 -t 50-52 qdel 18205 -t 100
Note that it is not possible to use
-t 25,26,50,51,52,100
to delete ad-hoc tasks.
Resubmitting Deleted/Failed Tasks
If you need to resubmit task ids that have been deleted, it is easiest to remove the #$ -t
line from the jobscript and then specify the tasks on the qsub
command line similar to the qdel
command. For example, to resubmit the above six ad-hoc tasks:
# First remove '#$ -t start-end' from your jobscript qsub -t 25-26 jobscript qsub -t 50-52 jobscript qsub -t 100 jobscript
Note that it is not possible to use -t 25,26,50,51,52,100
to submit ad-hoc tasks in one go.
Email from Jobarrays
As with ordinary batch jobs it is possible to have the job email you when it begins, ends or it aborts due to error. Unfortunately with a job array each task will email you. Hence you may receive 1000s of emails from a large job array. If this is what you require, please add the following to your jobscript:
#$ -M your.name@manchester.ac.uk # Can use any address #$ -m bea # # b = email when each job array task begins running # e = email when each job array task ends # a = email when each job array task aborts # # You can specify any one or more of b, e, a
It isn’t possible to have the job array email you only when the last task has finished. However, it is possible to do something very similar to this, as follows:
Email during last task
You can manually email yourself from the job array task with the highest (last) task id. There is no guarantee this will be the last task to finish (other tasks that started earlier may run for longer and finish later). But it will be the last task to start. Emailing from the last task to start would be close enough to also being the last task to finish in most cases. Do this as follows:
#$ -t 1-100 # Run some application in our jobscript ./my_app.exe data.$SGE_TASK_ID # Send email at end of last task. Will usually be # close enough to being the last task to finish. if [[ $SGE_TASK_ID == $SGE_TASK_LAST ]]; then echo "Last task $SGE_TASK_ID in job $JOB_ID finished" | mail -s "CSF jobarray" $USER fi
The email will be sent to your University email address.
Email from a Job Dependency after the Job Array
To guarantee that you receive an email after all tasks have finished you can submit a second serial job (not a job array) to the batch system, perhaps to the short area if available (CSF users only) that has a job dependency on the main job-array job. The second job will only run after all tasks in the jobarray have finished. Note however, that the second job may have to wait in the queue depending how busy the system is. So your email may arrive some extended time after the job array actually finished. To submit the jobs use the following command-lines:
# First, submit your job array as normal, noting the jobid qsub my-jobarray.sh Your job-array 869219.1-10:1 ("my-jobarray.sh") has been submitted # # Make a note of this job id # Now immediately submit a serial job with a dependency (-hold_jid jobid) # and request that it emails you when it ends (-m e) qsub -b y -hold_jid 869219 -m e -M $USER true # # # # this app simply returns 'no error' # # Use the job id from the previous job array
The second serial job will execute (it won’t do any useful work and will finish immediately) when the job array ends. It will send you an email when it has finished and so you will then know that the job array on which it was dependent has also finished. Note that the information in the email (wallclock time etc) is about the serial job, not the job array.
Further Information
More on SGE Job Arrays can be found at: