Job Dependencies (Slurm)
Slurm job dependencies allow you to specify that one job should not start until some other job(s) has met some condition. This is usually that the earlier job has finished successfully, but can be other things – see below.
For example, this might be useful if ‘jobB’ relies on the output from ‘jobA’ – it saves you having to be aware of when individual jobs have started or completed. Instead, a job dependency allows Slurm to simply get on with running them in the order you specify.
You can build up more complicated pipelines of jobs, where you might be waiting for several jobs to finish before further jobs can run. Job dependencies will ensure jobs run in the correct order.
Job Dependency via the sbatch command-line
To setup a job dependency, specify the type of dependency and the jobid of an already submitted that you want the current job submission to depend on.
Simple Examples
Here we give some simply examples, as these cover many common situations. See below for a complete list of the available dependency types.
# Note: You can use the short-form "-d deplist" or long-form "--dependency=deplist" flag
# A single job waiting for a single earlier job (123456) to finish successfully:
sbatch --dependency=afterok:123456 myjobscript
# A single job waiting for a single earlier job (123456) to start:
sbatch --dependency=after:123456 myjobscript
# A single job waiting for a single earlier job (123456) to finish with any state (success, error, ...)
sbatch --dependency=afterany:123456 myjobscript
#
# Note: afterany refers to the state of the earlier job.
# It is not referring to multiple dependencies. See below.
# A single job waiting for BOTH earlier jobs (123456, 123789) to finish successfully:
sbatch --dependency=afterok:123456:123789 myjobscript
# Multiple jobs waiting for a single earlier job (123456) to finish successfully:
sbatch --dependency=afterok:123456 myjobscriptA
sbatch --dependency=afterok:123456 myjobscriptB
Some more complex examples:
# A single job waiting for several earlier jobs that all have the same job name (from the same user).
# Note that the earlier jobs must only run one at a time, hence have their own dependencies.
JOBA=$(sbatch --job-name=mypipeline jobscriptA)
JOBB=$(sbatch --dependency=afterok:$JOBA --job-name=mypipeline jobscriptB)
JOBC=$(sbatch --dependency=afterok:$JOBB --job-name=mypipeline jobscriptC)
sbatch --dependency=singleton --job-name=mypipeline jobscriptD
#
# This one job will wait for the three earlier jobs
# Wait for BOTH dependencies with different dependency types
sbatch --dependency=afterok:123456,afternotok:123789 myjobscript
#
# Use a , to mean both dependencies
# Wait for ANY dependencies for jobs (123456, 123789)
sbatch --dependency=afterok:123456?afterok:123789 myjobscript
#
# Use a ? to mean any dependencies
Pipelines with Job Arrays
To construct dependencies between job arrays, to build up multi-task pipelines, please see the Slurm Job Arrays page.
Capturing the Job ID on the command line
You can specify the job dependency flag in a jobscript using #sbatch --dependency=..., but will find it easier on the sbatch command-line (due to the need to use a previous jobid).
Add the --parsable flag to your sbatch command to report only the jobid:
# Normally you get a user-friendly message when submitting a job: [username@login1[csf3] ~]$ sbatch myjobscript Submitted batch job 123456 # Instead, we only want the jobid [username@login1[csf3] ~]$ sbatch --parsablemyjobscript 123456
You can then capture the jobid using the BASH shell scripting language:
# Submit the job on which you want later jobs to depend, capturing its jobid
JOBID=$(sbatch --parsable firstjobscript)
# Now submit the job that will depend on the previous job
sbatch --dependency=afterok:${JOBID} secondjobscript
You can repeat the above procedure to submit multiple jobs as a pipeline:
# A pipeline of jobs:
#
# JobA ------> JobB ------> JobC
JOBID=$(sbatch --parsable JobA)
JOBID=$(sbatch --parsable --dependency=afterok:${JOBID} JobB)
JOBID=$(sbatch --parsable --dependency=afterok:${JOBID} JobC)
The Dependency Flag
The --dependency=deplist flag is of the form:
- One or more dependencies, separated by a comma (means all dependencies must be satisfied for the current job to start)
type:job_id[:job_id][,type:job_id[:job_id]]
- One or more dependencies, separated by a
?(means any of the dependencies must be satisfied for the current job to start)type:job_id[:job_id][?type:job_id[:job_id]]
Typically, you might just have a single jobid you want your current job to wait for. For example:
--dependency=afterok:123456
Types of Dependencies
The types of dependencies that can be specified using the --dependency=type flag:
| Flag | Description |
|---|---|
after:job_id[[+time][:jobid[+time]...]] |
After the specified jobs start or are cancelled and ‘time’ in minutes from job start or cancellation happens, this job can begin execution. If no ‘time’ is given then there is no delay after start or cancellation. |
after:job_id[[+time][:jobid[+time]...]] |
After the specified jobs start or are cancelled and ‘time’ in minutes from job start or cancellation happens, this job can begin execution. If no ‘time’ is given then there is no delay after start or cancellation. |
afterany:job_id[:jobid...] |
This job can begin execution after the specified jobs have terminated. This is the default dependency type. |
aftercorr:job_id[:jobid...] |
A task of this job array can begin execution after the corresponding task ID in the specified job has completed successfully (ran to completion with an exit code of zero). |
afternotok:job_id[:jobid...] |
This job can begin execution after the specified jobs have terminated in some failed state (non-zero exit code, node failure, timed out, etc). This job must be submitted while the specified job is still active or within MinJobAge seconds after the specified job has ended. |
afterok:job_id[:jobid...] |
This job can begin execution after the specified jobs have successfully executed (ran to completion with an exit code of zero). This job must be submitted while the specified job is still active or within MinJobAge seconds after the specified job has ended. |
singleton |
This job can begin execution after any previously launched jobs sharing the same job name and user have terminated. In other words, only one job by that name and owned by that user can be running or suspended at any point in time. |
