Short Jobs and Time Limits (Slurm)
NO Default Maximum Wallclock Time
The upgraded CSF now requires that you specify the maximum wallclock time that your job will be allowed to run for. There is NO default value.
Why the change? This is to improve the scheduling of jobs – Slurm may be able to run your job sooner if it can fit it in before other jobs are expected to start. By giving a more realistic wallclock time, Slurm has an accurate picture of how long jobs are allowed to run for.
Of course, if you’re not sure how long your job will take, you should err on the side of caution and give it plenty of wallclock time. The maximum permitted is 7 days.
We cannot extend a job’s max runtime once it has been submitted to the batch system.
If you fail to specify the wallclock time, you’ll receive the error:
sbatch: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)
Specifying a Wallclock Time
To set the max wallclock for your job, add the following to your jobscript:
#SBATCH -t hh:mm:ss # (--time=hh:mm:ss)
where hh is the number of hours, mm is the number of minutes, ss is the number of seconds.
Other acceptable time formats are:
#SBATCH -t minutes #SBATCH -t minutes:seconds #SBATCH -t hours:minutes:seconds #SBATCH -t days-hours # Recommended format #SBATCH -t days-hours:minutes #SBATCH -t days-hours:minutes:seconds
For example to give a limit of 10 minutes add the following line to your batch script:
#SBATCH -t 10
An example of 6 hours:
#SBATCH -t 06:00:00
An example of 2 days:
#SBATCH -t 2-0
Note that when the job time limit is reached the batch system sends a soft kill signal (SIGTERM) which some applications will detect and then shutdown cleanly – for example saving current state and results before exiting but this depends on your application’s capabilities. Some applications can checkpoint and then be restarted from a known status point. Please consult the manual for your software for more information.
If the job hasn’t shutdown 30 seconds after receiving the soft kill signal, a hard kill signal (SIGKILL) will be sent and the job will be killed immediately.
Short Jobs
A separate short job area does NOT exist on the upgraded CSF3 at the moment. But by adding a runtime limit to your job, as detailed above, your job may be selected to run sooner than other jobs in the system. If you know one hour, for example, will be enough for your job to complete its work, add
#SBATCH -t 01:00:00
to your jobscript.