qsub Options and Environment Variables

Jobscript vs Command-line

Batch job options can be specified in a qsub jobscript by placing #$ in front of the option, for example using

#$ -cwd

or on the qsub command line, for example using

qsub -cwd ... ... filename [optional args] 

Note: if filename is an executable (e.g., myapp.exe) rather than a jobscript you must use the -b y flag (please see below).

All of the possible flags are described in the manual page (man qsub). The most commonly used options are briefly described below. Note that the order in which you specify options, either in the jobscript or on the command line does not matter.

We recommend using a jobscript so that you have a permanent record of how you ran a job. This is important for reproducibility of results.

qsub Flags

-cwd
Execute the job from the current (working) directory — the directory from which the qsub command is issued. If this option is not present, the job will be executed the user’s home directory. The .oNNNNN and .eNNNNN stdout and stderr files created by SGE for each job will also be written to the directory specified by this flag (or the home directory if not present) unless the -o and -e flags are used to override where these files are written.
-V
(Uppercase V). This ensures that any environment settings you’ve made on the login node are inherited/passed to the compute node, including the settings applied by loading software modulefiles. A copy of your current environment is taken when you run the qsub command (i.e., immediately, not when the job finally runs). Hence you can change your environment after running qsub, perhaps to set up for another job, or even log out and your job will still see the environment that was in place when you originally ran qsub. Unlike the CSF2 instead of loading the modulefiles your job needs via the login node command line/in your .bashrc/.modules we now recommend that you load the required modulefiles in your jobscript instead. When loading modulefiles in your jobscript you should not use -V in the jobscript.
-j y
Merge the standard error stream into the standard output stream, i.e., job output and error messages are sent to the same .o file, rather than different files (usually .o and .e files).
-pe name.pe
Specify the SGE parallel-environment to which a job is sent — see the section on running parallel jobs.
-l resource
Specify a resource to modify where in the system the job is placed. For example -l mem512 to select a higher-memory node. You may specify more than one resource flag, for example -l haswell -l 's_rt=00:10:00' although not all combinations are supported. Resource flags exist for CPU architectures, job time limits, memory requirements, GPUs and interactivity so you should check those pages for details together with the parallel environment documentation to determine whether a resource and a PE are compatible (not all combinations are permitted).
-P project-code
Specify a project in to which your jobs will be accounted. Users should NOT use this flag unless specifically told to do so (e.g., you have been given an HPC-Pool project code.) By default your jobs will account in the project associated with your supervisor.
-S /bin/bash
(Uppercase S). Indicate your jobscript is written using /bin/bash shell syntax. This is not required in your jobscripts – by default the jobscript will use the shell specified on the first line via the #! marker.
-N name
(Uppercase N). Sets the job name, e.g. -N my_job_name to set job name to my_job_name. The .o and .e job output files will be named using this value — for example my_job_name.o12345 and my_job_name.e12345. If you don’t use the -N option then the job output files will use the name of the jobscript (or executable) specified on the qsub command-line. Do not use spaces in the name.

-o /path/to/dir
-e /path/to/dir
alternatively:
-o /path/to/dir/stdoutfile
-e /path/to/dir/stderrfile
or to prevent any .o and .e files being generated (use with caution!)
-o /dev/null
-e /dev/null
Use either the directory form or the filename form. If a directory name is given, it specifies the path to a directory where the usual standard output stream (stdout) and standard error stream (stderr) files (JobName.oNNNNN and JobName.eNNNNN respectively) will be written. The directories must already exist before the job runs – the batch system will not create them for you. If filenames are given, they specify the files to which stdout and stderr output will be written. No JobID number will be appended – your supplied filenames will be used as-is. If these flags are not used the standard output and error stream files will be written in the directory in which the job runs (see -cwd).

For example, this will force the job’s .oNNNNN.o and .eNNNNN files in to a directory name logs in your home directory:

#!/bin/bash --login
#$ -cwd
#$ -o ~/logs
#$ -e ~/logs
myapp.exe input.dat

You must ensure you have a directory named ~/logs (i.e. in your home directory) before submitting the job. The job will not create the directory for you. See also the -j flag for combining the .o and .e files in to one file (the .o file).

-hold_jid jobid
Specifies this job is conditional upon completion of a previous job or jobs, e.g. -hold_jid jobID to submit a job which will not start until jobID has completed. jobID can be a job number (e.g., 89213) or job name (i.e., the earlier job was named using the -N flag). Multiple jobIDs can be specified using a comma separated list of jobIDs. In that case the current job will not run until all specified jobIDs have finished.

-m bea
Causes an email to be sent when the job begins, when it ends and/or if it is aborted. You can specify any or all of the bea letters. For example, most users only want to know when a job ends or aborts so use -m ea.

Please note that you must put your email address in your jobscript or on the command line submission (see below for how) as it does not automatically detect you University email at the moment. We are looking into this.

-M emailaddress
(Uppercase M). Specify an email address to which -m status emails will be sent. You may supply a comma-separated list if you want to receive email at more than one address. For example:

-M my.name@manchester.ac.uk,myname@gmail.com

Please note that you must put your email address in your jobscript or on the command line submission as it does not automatically detect you University email at the moment. We are looking into this.

The following options are used on the qsub command-line directly, rather than in your jobscripts:

-b y
For use on the qsub command-line only. Indicates that the filename given on the qsub command line is an executable (binary) file, not a jobscript. This allows you to specify the executable directly on the command-line rather than in a job script. By default the qsub command assumes the filename refers to a jobscript. For example, the following command line and jobscript (submitted with qsub myjobscript) are equivalent:

qsub -b y -cwd /bin/hostname

and

#!/bin/bash
#$ -cwd
/bin/hostname

It is up to the user which method they prefer. However we recommend writing a jobscript so that you can see how the job was submitted if referring back to an old job (perhaps submitted months ago) rather than trying to remember a command-line. It also allows the sysadmins to identify more easily any problems with jobs.

-terse
This flag can be used when constructing pipelines – it causes the qsub command to return only the Job ID, not the user-friendly message about your job submission. This can then be captured and used in subsequent job submissions to make later jobs wait for earlier jobs. For example:

# Submit two jobs, where the second job will not run until the first job has finished:
JID=$(qsub -terse jobscript_1.sh)
qsub -hold_jid $JID jobscript_2.sh

When submitting job arrays, some extra information about the task range and increment is included in the terse output. You should remove this if capturing the job ID:

qsub -terse -t 1-100 jobscript-array1.sh
129674.1-100:1

# To capture only the jobid, use the cut command to remove the extra info:
qsub -terse -t 1-100 jobscript-array1.sh | cut -d. -f1
129674

See the individual pages (in menu of left side of this page) for the PE names and resources available on this system.

SGE Environment Variables

The following environment variables are available for use in your jobscript when the job runs. They can be used to create unique names for output files, for example, by including the job id or name in the output filename.

$NSLOTS
The number of cores requested using the -pe flag or 1 if running a serial job (no -pe option specified). Use this variable if your application requires the number of cores to use on its command-line, rather than repeating the number in two places. This makes running jobs with different numbers of cores easier. For example:

#$ -pe smp.pe 4
myapp -cores $NSLOTS -input sample.dat -output results.dat
  #
  # $NSLOTS will be automatically replaced with 4 in this example

You could also use this variable in the name of an output file if doing several runs with a different number of cores when timing your code. For example

#$ -pe smp.pe 4
myapp -cores $NSLOTS -input sample.dat -output results.${NSLOTS}cores.dat
   #
   # The output file will be named results.4cores.dat
$NGPUS
The number of GPUs requested by a GPU job using the -l nvidia_v100=N flag. For example

#$ -l nvidia_v100=2
myGPUApp --numgpus $NGPUS

[Technical note: this is a non-standard SGE env var, injected by the JSV]

$NHOSTS
The number of compute nodes in use by your job. For serial jobs (1-core) and single-node SMP (multi-core) jobs this will always be 1. For multi-node jobs (e.g., those running in mpi-24-ib.pe) then this will be the number of compute nodes. For example a 48-core job will need two 24-core compute nodes hence NHOSTS will be set to 2.
$JOB_ID
The unique job id number assigned to the job at runtime by the batch system. You can use this to generate unique filenames that won’t be overwritten by other jobs. For example:

#$ -cwd
myapp -input sample.dat -output results.$JOB_ID.dat
   #
   # Output file will be named results.37823.dat where 37823 is my unique jobid.
$JOB_NAME
The value of the -N flag if present or the name of the jobscript if that flag is not used. Note that a unique jobid is always generated even if you use the -N flag. For example:

#$ -cwd
#$ -N phase1
myapp -input sample.dat -output results.$JOB_NAME.$JOB_ID.dat
  #
  # Names the output file results.phase1.38795.dat (in this case)
$PE
The name of the parallel environment given after the -pe flag in parallel jobs. For example smp.pe or mpi-24-ib.pe. Unset in serial (1-core) jobs.
$SGE_O_WORKDIR
The full path to the directory from where you submitted the job.
$SGE_TASK_ID
$SGE_TASK_FIRST
$SGE_TASK_LAST
$SGE_TASK_STEPSIZE
See the Job Arrays documentation for environment variables related to each task.
$PE_HOSTFILE
You will not normally need to use this variable in your jobscripts. However, some applications documented on the CSF software page process the names of the nodes on which your job will run in to their own format. This variable gives the name of a file containing the names of the nodes on which your job has been scheduled to run. Do not, however, change the value of this variable yourself.

Automatically Requeue a Job

A jobscript can ask the batch system to automatically requeue the job when the current job has finished. This can be used with an application that does checkpointing. This is where an application saves its current state to disk and then, when a new job starts, it can read the previous state and carry on from where it left off. This allows an app to run for more than 7 days (the max runtime on the CSF) by running one job after another, saving the state between each run.

A jobscript that exits with code 99 will automatically requeue. Add the following to your jobscript:

exit 99

The jobscript will then automatically be waiting in the queue to run again. It will have the special Rr status when it runs, indicating it is a requeued job. The output from the job will be appended to the existing jobname.o12345 and jobname.e12345 files.

Another use of this method would be where a job checks its own results files and may decide that it needs to rerun an analysis with some different parameters. You should add what ever checking of the results you need to perform to the jobscript then using the exit 99 command to terminate the jobscript.

For an example of an application that does checkpointing please see the StarCCM webpage.

Last modified on December 9, 2024 at 12:00 pm by George Leaver