Short Jobs and Time Limits

Default Maximum Wallclock Time (Intel nodes)

The wallclock time is currently set to a maximum of 7 days (unless otherwise noted in the
parallel environment (PE) table).

For job-arrays, the 7 day limit applies to each individual task in the job array. There is no limit on how long it can take to complete all tasks overall, but each individual task can run for 7 days at most.

A 7 day runtime limit also affects how long your jobs may have to queue for. In general, the longer the permitted wallclock, the longer you potentially have to wait for your job to run: a 7 day wallclock limit is longer than most HPC systems offer. We find that this suits our users’ workloads. But it does mean when the system is busy you could be waiting for up to 24 hours for some of your jobs to run (jobs are finishing all the time on the CSF and hence new jobs are continually being selected to run – but please be patient if your job does not run immediately, see the related FAQ).

We cannot extend a job’s max runtime – see FAQ for jobs that require more than 7 days runtime for advice on how you might modify your code / jobscripts before submitting them.

Specifying a Shorter Wallclock Time

It is recommended, if you know the wallclock time you require, to state this during submission as this helps the SGE scheduler make decisions. This is, however, optional (unlike some HPC systems). This is done by adding to your jobscript:

#$ -l s_rt=hh:mm:ss

where hh is the number of hours, mm is the number of minutes, ss is the number of seconds.

For example to give a limit of 10 minutes add the following line to your batch script:

#$ -l s_rt=00:10:00

An example of 6 hours:

#$ -l s_rt=06:00:00

An example of 2 days:

#$ -l s_rt=48:00:00

Note that s_rt causes the batch system to send a soft kill signal which some applications will detect and then shutdown cleanly – for example saving current state and results before exiting, but this depends on your application’s capabilities.

Use h_rt for a hard time limit (meaning the batch system will simply kill the job when the time limit is reached).

If a job terminates at the time limit it may not save any data – this is dependent on the software or code you are running. Some applications can checkpoint and then be restarted from a known status point. Please consult the manual for your software for more information.

Short Jobs

Intel

A small number of Intel nodes have been reserved for short test jobs or work such as post-processing. To access it please add the following option to your batch script:

#$ -l short

If submitting from the command line you do not need the #$ just include -l short.

The maximum wallclock time with this option is currently 1 hour and the maximum job size is 24 cores.

Please do not fill ‘short’ with production compute work as it means people who genuinely need to test cannot access it. Job arrays should not be run in the short environment because they can prevent any other user getting a job to run due to the small number of cores assigned to the short environment.

Last modified on March 27, 2024 at 5:38 pm by George Leaver