The Computational Shared Facility 3

Job Dependencies

SGE job dependencies allow you to specify that one job should not start until some other job has completed. For example, this might be useful if ‘jobB’ relies on the output from ‘jobA’ and it saves you having to be aware of when individual jobs have started or completed because SGE will simply get on with running them in the order you specify.

You can build up more complicated pipelines of jobs, where you might be waiting for several jobs to finish before further jobs can run. Job dependencies will ensure jobs run in the correct order.

Job Dependency via the Jobscript

To use this functionality add this option to your submission script:

#$ -hold_jid jobid
               # replacing jobid with the number of the job to wait for.

When you type qstat the jobs that are waiting on others will be listed as hqw.

It may be easier to set up job dependencies by naming jobs rather than specifying jobIDs. For example, the first job can have the following added to its jobscript:

#$ -N myFirstJob

This will name the job (and cause its output .o and .e files to use that name). The second job which is going to wait for the first job can now refer to it by name:

#$ -hold_jid myFirstJob

Our job dependency is now independent of the jobID number the batch system assigns to the first job.

Job Dependency via the qsub command-line

Alternatively you can add the -hold_jid flag to the qsub command-line:

# Submit the first job
Your job 129673 ("") has been submitted
           # Make a note of the job id

# Now submit the second job and make it wait for the first
qsub -hold_jid 129673

By adding the flag to the qsub command-line we don’t have to edit the second jobscript to set the unique jobid to wait for. We don’t need to name the jobs either.

Scripting the use of qsub

You may wish to develop a helper shell script to submit dependency jobs. This could simplify the submitting of complicated job pipelines by running the qsub command and processing the job IDs of the various jobs. Adding the -terse flag to the qsub command simplifies your shell script by changing the message reported by qsub to be just the job id of the newly submitted job. For example:

qsub -terse
   # The message from qsub is now only the jobid of this job
   # (instead of 'Your job 129673 ("") has been submitted)

You could capture this in to a shell variable and use it to submit a second job that waits on this job to finish. For example:

FIRST_JOBID=$(qsub -terse
SECOND_JOBID=$(qsub -terse -hold_jid $FIRST_JOBID

You should add some error-checking shell-script code to handle the case where a qsub command fails due to an error in a jobscript, for example. In this case an empty string (i.e. no job id) is returned by the qsubstderr).

Waiting for more than one job

If your job needs to wait for multiple jobs to finish you may specify a comma separated list of jobIDs (or names) on the -hold_jid line. For example

# Wait for three earlier jobs (which were given -N names) to
# finish before we proceed

#$ -hold_jid myFirstJob,SimWork,dataJob

Job Array Dependencies

There are a number of ways in which job dependencies can be used with job arrays. Please see our Job Array Information for further details.

Last modified on September 23, 2019 at 11:30 am by George Leaver