qsub Options and Environment Variables
Jobscript vs Command-line
Batch job options can be specified in a qsub
jobscript by placing #$
in front of the option, for example using
#$ -cwd
or on the qsub
command line, for example using
qsub -cwd ... ... filename [optional args]
Note: if filename is an executable (e.g., myapp.exe
) rather than a jobscript you must use the -b y
flag (please see below).
All of the possible flags are described in the manual page (man qsub
). The most commonly used options are briefly described below. Note that the order in which you specify options, either in the jobscript or on the command line does not matter.
qsub Flags
- -cwd
- Execute the job from the current (working) directory — the directory from which the
qsub
command is issued. If this option is not present, the job will be executed the user’s home directory. The.oNNNNN
and.eNNNNN
stdout and stderr files created by SGE for each job will also be written to the directory specified by this flag (or the home directory if not present) unless the-o
and-e
flags are used to override where these files are written. - -V
- (Uppercase V). This ensures that any environment settings you’ve made on the login node are inherited/passed to the compute node, including the settings applied by loading software
modulefiles
. A copy of your current environment is taken when you run theqsub
command (i.e., immediately, not when the job finally runs). Hence you can change your environment after running qsub, perhaps to set up for another job, or even log out and your job will still see the environment that was in place when you originally ran qsub. Unlike the CSF2 instead of loading the modulefiles your job needs via the login node command line/in your.bashrc
/.modules
we now recommend that you load the required modulefiles in your jobscript instead. When loading modulefiles in your jobscript you should not use-V
in the jobscript. - -j y
- Merge the standard error stream into the standard output stream, i.e., job output and error messages are sent to the same
.o
file, rather than different files (usually.o
and.e
files). - -pe name.pe
- Specify the SGE parallel-environment to which a job is sent — see the section on running parallel jobs.
- -l resource
- Specify a resource to modify where in the system the job is placed. For example
-l mem512
to select a higher-memory node. You may specify more than one resource flag, for example-l haswell -l 's_rt=00:10:00'
although not all combinations are supported. Resource flags exist for CPU architectures, job time limits, memory requirements, GPUs and interactivity so you should check those pages for details together with the parallel environment documentation to determine whether a resource and a PE are compatible (not all combinations are permitted). - -P project-code
- Specify a project in to which your jobs will be accounted. Users should NOT use this flag unless specifically told to do so (e.g., you have been given an HPC-Pool project code.) By default your jobs will account in the project associated with your supervisor.
- -S /bin/bash
- (Uppercase S). Indicate your jobscript is written using
/bin/bash
shell syntax. This is not required in your jobscripts – by default the jobscript will use the shell specified on the first line via the#!
marker. - -N name
- (Uppercase N). Sets the job name, e.g.
-N my_job_name
to set job name tomy_job_name
. The.o
and.e
job output files will be named using this value — for examplemy_job_name.o12345
andmy_job_name.e12345
. If you don’t use the-N
option then the job output files will use the name of the jobscript (or executable) specified on theqsub
command-line. Do not use spaces in the name.
- -o /path/to/dir
-e /path/to/dir
alternatively:
-o /path/to/dir/stdoutfile
-e /path/to/dir/stderrfile
or to prevent any .o and .e files being generated (use with caution!)
-o /dev/null
-e /dev/null - Use either the directory form or the filename form. If a directory name is given, it specifies the path to a directory where the usual standard output stream (stdout) and standard error stream (stderr) files (
JobName.oNNNNN
andJobName.eNNNNN
respectively) will be written. The directories must already exist before the job runs – the batch system will not create them for you. If filenames are given, they specify the files to which stdout and stderr output will be written. No JobID number will be appended – your supplied filenames will be used as-is. If these flags are not used the standard output and error stream files will be written in the directory in which the job runs (see-cwd
).For example, this will force the job’s
.oNNNNN
.o and.eNNNNN
files in to a directory namelogs
in your home directory:#!/bin/bash --login #$ -cwd #$ -o ~/logs #$ -e ~/logs myapp.exe input.dat
You must ensure you have a directory named
~/logs
(i.e. in your home directory) before submitting the job. The job will not create the directory for you. See also the-j
flag for combining the .o and .e files in to one file (the .o file). - -hold_jid jobid
- Specifies this job is conditional upon completion of a previous job or jobs, e.g.
-hold_jid jobID
to submit a job which will not start untiljobID
has completed.jobID
can be a job number (e.g., 89213) or job name (i.e., the earlier job was named using the-N
flag). Multiple jobIDs can be specified using a comma separated list of jobIDs. In that case the current job will not run until all specified jobIDs have finished.
- -m bea
- Causes an email to be sent when the job begins, when it ends and/or if it is aborted. You can specify any or all of the
bea
letters. For example, most users only want to know when a job ends or aborts so use-m ea
.Please note that you must put your email address in your jobscript or on the command line submission (see below for how) as it does not automatically detect you University email at the moment. We are looking into this.
- -M emailaddress
- (Uppercase M). Specify an email address to which
-m
status emails will be sent. You may supply a comma-separated list if you want to receive email at more than one address. For example:-M my.name@manchester.ac.uk,myname@gmail.com
Please note that you must put your email address in your jobscript or on the command line submission as it does not automatically detect you University email at the moment. We are looking into this.
The following options are used on the qsub
command-line directly, rather than in your jobscripts:
- -b y
- For use on the qsub command-line only. Indicates that the filename given on the
qsub
command line is an executable (binary) file, not a jobscript. This allows you to specify the executable directly on the command-line rather than in a job script. By default theqsub
command assumes the filename refers to a jobscript. For example, the following command line and jobscript (submitted withqsub myjobscript
) are equivalent:qsub -b y -cwd /bin/hostname
and
#!/bin/bash #$ -cwd /bin/hostname
It is up to the user which method they prefer. However we recommend writing a jobscript so that you can see how the job was submitted if referring back to an old job (perhaps submitted months ago) rather than trying to remember a command-line. It also allows the sysadmins to identify more easily any problems with jobs.
- -terse
- This flag can be used when constructing pipelines – it causes the
qsub
command to return only the Job ID, not the user-friendly message about your job submission. This can then be captured and used in subsequent job submissions to make later jobs wait for earlier jobs. For example:# Submit two jobs, where the second job will not run until the first job has finished: JID=$(qsub -terse jobscript_1.sh) qsub -hold_jid $JID jobscript_2.sh
When submitting job arrays, some extra information about the task range and increment is included in the terse output. You should remove this if capturing the job ID:
qsub -terse -t 1-100 jobscript-array1.sh 129674.1-100:1 # To capture only the jobid, use the cut command to remove the extra info: qsub -terse -t 1-100 jobscript-array1.sh | cut -d. -f1 129674
See the individual pages (in menu of left side of this page) for the PE names and resources available on this system.
SGE Environment Variables
The following environment variables are available for use in your jobscript when the job runs. They can be used to create unique names for output files, for example, by including the job id or name in the output filename.
- $NSLOTS
- The number of cores requested using the
-pe
flag or1
if running a serial job (no-pe
option specified). Use this variable if your application requires the number of cores to use on its command-line, rather than repeating the number in two places. This makes running jobs with different numbers of cores easier. For example:#$ -pe smp.pe 4 myapp -cores $NSLOTS -input sample.dat -output results.dat # # $NSLOTS will be automatically replaced with 4 in this example
You could also use this variable in the name of an output file if doing several runs with a different number of cores when timing your code. For example
#$ -pe smp.pe 4 myapp -cores $NSLOTS -input sample.dat -output results.${NSLOTS}cores.dat # # The output file will be named results.4cores.dat
- $NGPUS
- The number of GPUs requested by a GPU job using the
-l nvidia_v100=N
flag. For example#$ -l nvidia_v100=2 myGPUApp --numgpus $NGPUS
[Technical note: this is a non-standard SGE env var, injected by the JSV]
- $NHOSTS
- The number of compute nodes in use by your job. For serial jobs (1-core) and single-node SMP (multi-core) jobs this will always be 1. For multi-node jobs (e.g., those running in
mpi-24-ib.pe
) then this will be the number of compute nodes. For example a 48-core job will need two 24-core compute nodes henceNHOSTS
will be set to2
. - $JOB_ID
- The unique job id number assigned to the job at runtime by the batch system. You can use this to generate unique filenames that won’t be overwritten by other jobs. For example:
#$ -cwd myapp -input sample.dat -output results.$JOB_ID.dat # # Output file will be named results.37823.dat where 37823 is my unique jobid.
- $JOB_NAME
- The value of the
-N
flag if present or the name of the jobscript if that flag is not used. Note that a unique jobid is always generated even if you use the-N
flag. For example:#$ -cwd #$ -N phase1 myapp -input sample.dat -output results.$JOB_NAME.$JOB_ID.dat # # Names the output file results.phase1.38795.dat (in this case)
- $PE
- The name of the parallel environment given after the
-pe
flag in parallel jobs. For examplesmp.pe
ormpi-24-ib.pe
. Unset in serial (1-core) jobs. - $SGE_O_WORKDIR
- The full path to the directory from where you submitted the job.
- $SGE_TASK_ID
$SGE_TASK_FIRST
$SGE_TASK_LAST
$SGE_TASK_STEPSIZE - See the Job Arrays documentation for environment variables related to each task.
- $PE_HOSTFILE
- You will not normally need to use this variable in your jobscripts. However, some applications documented on the CSF software page process the names of the nodes on which your job will run in to their own format. This variable gives the name of a file containing the names of the nodes on which your job has been scheduled to run. Do not, however, change the value of this variable yourself.
Automatically Requeue a Job
A jobscript can ask the batch system to automatically requeue the job when the current job has finished. This can be used with an application that does checkpointing. This is where an application saves its current state to disk and then, when a new job starts, it can read the previous state and carry on from where it left off. This allows an app to run for more than 7 days (the max runtime on the CSF) by running one job after another, saving the state between each run.
A jobscript that exits with code 99
will automatically requeue. Add the following to your jobscript:
exit 99
The jobscript will then automatically be waiting in the queue to run again. It will have the special Rr
status when it runs, indicating it is a requeued job. The output from the job will be appended to the existing jobname.o12345
and jobname.e12345
files.
Another use of this method would be where a job checks its own results files and may decide that it needs to rerun an analysis with some different parameters. You should add what ever checking of the results you need to perform to the jobscript then using the exit 99
command to terminate the jobscript.
For an example of an application that does checkpointing please see the StarCCM webpage.