High Throughput Computing using Condor

Submitting your first job to Condor

Introduction

This page is intended as tutorial to introduce you to submitting condor jobs. Before you begin, it is suggested that you work through:

In this tutorial we are going to run a very simple job on the condor pool. The job, called myscript.sh, is written in the bash scripting language. It is a simple text file containing the commands we wish to run, and is as follows:

#!/bin/bash
/sbin/ifconfig

All this script does is use the Linux ifconfig command to return network card information from the machine on which it was run. This command is a general system command and the path written for the command is the full path to it. More realistically you could be running a chemistry app or a python data processing script, for example – see the software list to get an idea of the apps installed on Condor.

Note: If you’ve compiled your own app (e.g., downloaded source-code from github and compiled it on the submitter node), the command your job will run will likely be located in the current working directory, probably your Condor home directory. In this case it is necessary to place ./ in front of the command so that Condor knows which directory to find the script/program in, such as

#!/bin/bash
./myapp -e variable1 -g variable2

In order to run our myscript.sh job on the condor pool we also need to supply a condor submit file, such as the following, submit.txt. The submit file is how you specify various settings about the job – e.g., names of output/log files, how much memory you need and so on:

universe = vanilla
executable = myscript.sh
log = log.txt
Output = out.txt
Error = err.txt
notification = error
Request_Memory = 1024
Requirements = (OpSys == "LINUX" && Arch == "X86_64")
when_to_transfer_output = on_exit
queue

A key Condor concept to remember is that your job file (e.g., myscript.sh) and any input files it requires will be copied from the submitter node to the compute node / PC where the job will run. Condor will do this for you. Similarly for any result files and logs files generated by your job – they can be copied back to the submitter node. The submit file tells condor the names of these various files so that it can do the copying.

Instructions

  • Download the job file, myscript.sh, and the Condor submission file, submit.txt to your local machine (e.g., your PC / laptop) and place them both in a folder called myfirstjob.
  • Transfer the myfirstjob folder from your PC/laptop over to your home directory on the Condor submitter node (Instructions for Transferring files to and from Condor)
  • Log in to the Condor submit node (Instructions for Connecting to Condor)
  • At the command line, move into the myfirstjob folder with the command
    cd myfirstjob
    
  • Submit the job to the pool using the command
    condor_submit submit.txt
    
  • Under normal conditions, a job as small as this will execute almost immediately. So, after a few seconds take a look at the output files by running the ls command. You should see something like the following
    [zzaabbcc@submitter myfirstjob]$ ls
    err.txt  log.txt  myscript.sh  out.txt  submit.txt
    

    Note the extra files err.txt, log.txt and out.txt. Notice also that these names were specified in the submit file (submit.txt).

  • If these files have not been created then it is likely that the Condor pool is busy and so your job has been queued. To see all jobs currently queued under your username, execute the following command (using your own username in place of yourusername)
    condor_q yourusername
    

    which will show something similar to:

    -- Schedd: submitter.itservices.manchester.ac.uk : <10.99.203.110:22854?...
     ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
    64478.0  zzaabbcc        4/24 14:31   0+00:00:00 I  0   0.0  myscript.sh
    
    1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
    

    Jobs that are still waiting in the queue are labelled as “idle” and have an I in the ST (state) column.

Details of the output files

  • log.txt This contains a log of events the job had during its lifetime inside of Condor. For example, it will tell you where the job got submitted from, which machine(s) it executed on and so on. Its filename is given by the log parameter in submit.txt
  • out.txt If you ran your job directly on a machine rather than using condor then it would probably send a lot of output to the screen. When you run it on condor, however, the output gets sent to this file instead. Its filename is given by the Output parameter in submit.txt
  • err.txt This file contains error messages (if any). Its filename is given by the Error parameter in submit.txt

Bursting into the cloud

As well as running HTCondor jobs on-site, we now have the ability to “burst” jobs into the Cloud – at the moment using Amazon Web Services (AWS). This is useful if you find your jobs are waiting in the queue for a long time. For a more detailed description of cloud bursting please see here

How to enable HTCondor Bursting

Currently all our Virtual Machines (VMs) spin-up in the AWS region us-east-2 (Ohio). If your code is using or generating any sensitive data you should not use this service.

To allow your HTCondor jobs to use this facility you need to add the following line to your HTCondor submission script:

+MayUseAWS=True

The submit.txt script above would now look like this:

universe = vanilla
executable = myscript.sh
log = log.txt
Output = out.txt
Error = err.txt
notification = error
Request_Memory = 1024
Requirements = (OpSys == "LINUX" && Arch == "X86_64")
when_to_transfer_output = on_exit
+MayUseAWS=True
queue

Last modified on June 22, 2021 at 10:31 am by George Leaver