Short Jobs and Time Limits (Slurm)

There is NO Default Job Wallclock Time

The upgraded CSF now requires that you specify the maximum wallclock time that your job will be allowed to run for. If your job is still running after this amount of time, the system will kill the job.

There is NO default value, but there is a maximum you are allowed to specify (in the SGE batch system, the default was 7 days if you didn’t specify a wallclock time limit)

Why the change? This is to improve the job scheduling – Slurm may be able to run your job sooner if it can fit it in before other jobs are expected to start using similar resources. By giving a more realistic wallclock time, Slurm can better schedule jobs.

You don’t need to be super accurate! If you’re not sure how long your job will take, you should err on the side of caution and give it plenty of wallclock time. The maximum permitted is 7 days in most cases (4 days for GPU jobs.) Consult the Partitions page to see the per-partition limits.

Note: We cannot extend a job’s max runtime once it has been submitted to the batch system.

If you fail to specify the wallclock time, you’ll receive the error:

sbatch: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)

Specifying a Wallclock Time

To set the max wallclock for your job, add the following to your jobscript:

#SBATCH -t d-hh:mm:ss         # (--time=d-hh:mm:ss)

where d is the number of day, hh is the number of hours, mm is the number of minutes, ss is the number of seconds.

Other acceptable time formats are:

#SBATCH -t minutes
#SBATCH -t minutes:seconds
#SBATCH -t hours:minutes:seconds
#SBATCH -t days-hours                   # Recommended format (e.g., 4-0 for 4 days)
#SBATCH -t days-hours:minutes
#SBATCH -t days-hours:minutes:seconds

For example to give a limit of 10 minutes add the following line to your batch script:

#SBATCH -t 10

An example of 6 hours:

#SBATCH -t 06:00:00

#### OR use 0 days, 6 hours:
#SBATCH -t 0-6

An example of 2 days:

#SBATCH -t 2-0

Note that when the job time limit is reached the batch system sends a soft kill signal (SIGTERM) which some applications will detect and then shutdown cleanly – for example saving current state and results before exiting but this depends on your application’s capabilities. Some applications can checkpoint and then be restarted from a known status point. Please consult the manual for your software for more information.

If the job hasn’t shutdown 30 seconds after receiving the soft kill signal, a hard kill signal (SIGKILL) will be sent and the job will be killed immediately.

Short Jobs

A separate short job area does NOT exist on the upgraded CSF3 at the moment. But by adding a runtime limit to your job, as detailed above, your job may be selected to run sooner than other jobs in the system. If you know one hour, for example, will be enough for your job to complete its work, add

#SBATCH -t 01:00:00

to your jobscript.

Last modified on April 3, 2025 at 9:22 am by George Leaver