Parallel Jobs 10 Minute Tutorial

Please note: It is assumed you have already done the Batch System 10 Minute Tutorial. If not, please do so before attempting this tutorial.

Another Tutorial: Submitting a Parallel Job to the Batch System

The following tutorial is optional and aimed at users wishing to run parallel jobs. You may wish to come back to this tutorial once you are more familiar with the CSF.

A parallel job can be used when your application software is known to support parallel processing.

These applications use more than one CPU core to improve their performance (i.e., give you the results sooner!) They can also access more memory than a serial (1-core) application and so can usually tackle larger problems (e.g., read in larger input data files, solve more equations, run larger simulations.)

Many of the centrally-installed applications on the CSF support parallel processing.

Parallel applications can use multiple CPU cores within a single compute node.

Some parallel applications even support running larger parallel jobs across multiple compute nodes.

Not all software supports parallel processing. If your application does not support it then there is no point running a parallel job – the CSF will not magically make it run on multiple CPU cores.

However, if you have a lot of data files to process, say, or a lot of simulations to run (a parameter sweep) then you may wish to run multiple copies of an application to process lots of different datasets at the same time using a type of batch job known as a job array. Even if the application is not a parallel application, running lots of copies of the app to process lots of datasets can give you your results sooner.

You should check the documentation for your particular application to see if it supports parallel processing.

In the following tutorial we will:

  1. Run a simple matrix-multiplication application that multiplies two large square matrices of numbers together. This is a common task in many engineering applications. For the purposes of this tutorial it doesn’t matter what the task is, but it does demonstrate how to submit a parallel job to the batch system.
  2. Repeat the job with a different number of cores to see how it affects performance.
  3. The tutorial will also show how to use a modulefile to access a centrally installed application.

The following steps assume you have already logged in to the CSF and have followed the Batch System 10 Minute Tutorial (which explains some of the steps in more detail).

Step 1: Create a Job Description File (a jobscript)

As in the previous tutorial, we need a simple text file (the jobscript) describing the job we wish to run. We will add some extra information to the jobscript to request more than 1 CPU core (which is the default).

Create a directory (usually referred to as a folder in Windows or MacOS) in your CSF home storage area, for our second test job, by running the following commands at the prompt:

# All of these commands are run on the CSF login node at the prompt
mkdir ~/second-job            # Create the directory (folder)
cd ~/second-job               # Go in to the directory (folder)

Now use gedit, or another editor, on the CSF login node (running text editors on the login node is permitted) to create a file with exactly the following content (see below):

# Run this command on the CSF login node at the prompt
gedit second-job.txt


Here’s the jobscript content – put this in the text file you are creating

#!/bin/bash --login
#$ -cwd
#$ -pe smp.pe 4          # Run in the 'smp.pe' parallel environment with 4 cores
                         # This is the line that isn't present in a serial (1-core) job.
                         # It is how we request more than one CPU core for the job.

# Set up to use the centrally installed tutorial application. The CSF has modulefiles for 100s of apps.
module load apps/intel-17.0/tutorial

# Inform the app how many cores we requested for our job. The app can use this many cores.
# The special $NSLOTS keyword is automatically set to the number used on the -pe line above.
export OMP_NUM_THREADS=$NSLOTS

# Run the app, which in this tutorial is named 'pmp'
pmp

Note: lines must NOT be indented in your text file – there should NOT be any spaces at the start of the lines. Cut-n-paste from this web page will work correctly in most browsers in that it won’t copy any leading space.

This BASH script has the following parts:

  1. The first line, #!/bin/bash --login, means that the file you create is treated as a BASH script (scripts in Linux can use several languages, BASH is the one we use for jobscripts). The --login is needed to make the module command work inside the jobscript.
  2. The #$ -cwd line is as before – it runs the job in the directory (folder) from where you submit the job.
  3. The #$ -pe smp.pe 4 line is new – this makes the job a parallel job. It asks the batch system to reserve 4 cores (in this example) in the smp.pe parallel environment. The parallel environment is used to say what type of parallel job will be run. In this case we are running a single compute-node multi-core job. This is known as an SMP job. Other types of parallel job are available but we will not cover those here.
  4. The module load apps/intel-17.0/tutorial line is new – this will load a modulefile in to the job’s environment when it runs on a compute node. The modulefile will apply settings (possibly loading other modulefiles) needed to allow the pmp application to run. All of the centrally installed applications have modulefiles to make running the apps as easy as possible.
  5. The export OMP_NUM_THREADS=$NSLOTS line is new – this is how we inform the pmp application how many CPU cores it is allowed to use. The app does not know this automatically. We reserved 4 cores in the batch system but we must then inform the application that it can use 4 cores.

    The $NSLOTS variable is automatically set by the batch system to the number of cores requested on the $# -pe line. So this is a convenient way of always getting the correct number of cores.

  6. The pmp line is new – pmp is the name of the parallel matrix multiplication application we are going to run.

Step 2: Copy to scratch area

We now copy the jobscript to your scratch area – recall we recommend running jobs in your scratch area – it is faster and permits jobs to write large temporary files without filling up your group’s home directory quota. But you must remember to copy important results back to the home area for safe keeping.

cp second-job.txt ~/scratch

We can now go in to the scratch area:

cd ~/scratch

Our scratch directory is now our current working directory. When we submit the job to the batch queue (see next step) it will run in the scratch area, outputting any results files there.

Step 3: Submit the Job to the Batch System

Assuming your jobscript is called second-job.txt, submit your jobscript (the copy that is in your scratch area) to the batch system:

qsub second-job.txt

You’ll see a message printed similar to:

Your job 195502 ("second-job.txt") has been submitted

The job id 195502 is a unique number identifying your job (obviously you will receive a different number). You may use this in other commands later.

Step 4: Check Job Status

Use the qstat command to check the job status. You should be able to determine if it is queued-waiting (qw), running (r), in error (Eqw) or finished (qstat shows nothing).

Step 5: Review Job Results/Output

The job will have created two output files: second-job.txt.o195502 and second-job.txt.e195502 which contains the output from the job. The .e file should be empty. Let’s have a look at the file sizes by doing a long listing which shows more information about the files:

# Run the 'ls' command with a '-ltr' flag added for a long listing 
# with the most recently updated files listed at the bottom of the listing.
ls -ltr
-rw------- 1 username xy01       345 Jan  4 13:16 second-job.txt
-rw-r--r-- 1 username xy01         0 Jan  4 13:16 second-job.txt.e195502
-rw-r--r-- 1 username xy01       337 Jan  4 13:18 second-job.txt.o195502
  #                    #          #    #########    #               #
  #                    #          #        #        #               # Your job id number will
  #                    #          #        #        #               # be different to this.
  # File permissions   #          #        #        # Filenames
                       #          #        #
                       #          #        # Date and time of last update
                       #          #        # (i.e. when something was written to the file)
                       #          #
                       #          # Filesize in bytes. Notice the 0 sized .e file.
                       #          # This is usually a good sign - no errors reported.
                       #
                       # The group you are in. It usually indicates
                       # your faculty or supervisor.

Examine the contents of the second-job.txt.o195502 file – any ouput printed by the pmp app will have been captured in to here:

cat second-job.txt.o195502
                      #
                      # Use the jobid number for your job!

You will see the number of cores used by pmp reported, followed by the 2D matrix size used in the tests, followed by timing information for five runs of the matrix calculation.

Step 6: Repeat the job with a different number of cores

To show the effect of using more cores with the pmp application, edit your jobscript:

gedit second-job.txt

then change the number of cores

#$ -pe smp.pe 8         # Use 8 cores instead of 4 previously

Save the file and resubmit it to the batch system:

qsub second-job.txt
Your job 195503 ("second-job.txt") has been submitted

When the job has completed (check using qstat) have a look at the timing information for the second run of the job. It should show the five runs of the calculation were done in approximately half the time:

cat second-job.txt.o195503
                      #
                      # Use the jobid number for your job!

If you wish to run the pmp application with a different number of cores (up to 32 cores are permitted in the smp.pe parallel environment) then edit the jobscript again and resubmit the job.

Summary

You have now been able to run a parallel job. It was a single-node multi-core job which used multiple CPU cores within a single compute node. The application supports this type of parallel processing and we could verify that it ran quicker with more cores. We used a modulefile to give us access to the centrally installed pmp application.

More on Using the Batch System (parallel jobs, GPUs, high-mem)

The batch system, SGE, has a great deal more functionality than described above. Other features, including:

These are fully documented (with example job scripts) in the CSF SGE documentation.

Finally, each centrally installed application has its own application webpage where you will find examples of how to submit a job for that specific piece of software and any other information relevant to running it in batch such as extra settings that may be required for it to work.

Last modified on September 27, 2023 at 12:25 pm by George Leaver