Submitting your first job to Condor
Introduction
This page is intended as tutorial to introduce you to submitting condor jobs. Before you begin, it is suggested that you work through:
In this tutorial we are going to run a very simple job on the condor pool. The job, called myscript.sh
, is written in the bash scripting language. It is a simple text file containing the commands we wish to run, and is as follows:
#!/bin/bash
/sbin/ifconfig
All this script does is use the Linux ifconfig
command to return network card information from the machine on which it was run. This command is a general system command and the path written for the command is the full path to it. More realistically you could be running a chemistry app or a python data processing script, for example – see the software list to get an idea of the apps installed on Condor.
Note: If you’ve compiled your own app (e.g., downloaded source-code from github and compiled it on the submitter node), the command your job will run will likely be located in the current working directory, probably your Condor home directory. In this case it is necessary to place ./ in front of the command so that Condor knows which directory to find the script/program in, such as
#!/bin/bash
./myapp -e variable1 -g variable2
In order to run our myscript.sh
job on the condor pool we also need to supply a condor submit file, such as the following, submit.txt
. The submit file is how you specify various settings about the job – e.g., names of output/log files, how much memory you need and so on:
universe = vanilla executable = myscript.sh log = log.txt Output = out.txt Error = err.txt notification = error Request_Memory = 1024 Requirements = (OpSys == "LINUX" && Arch == "X86_64") when_to_transfer_output = on_exit queue
A key Condor concept to remember is that your job file (e.g., myscript.sh
) and any input files it requires will be copied from the submitter node to the compute node / PC where the job will run. Condor will do this for you. Similarly for any result files and logs files generated by your job – they can be copied back to the submitter node. The submit file tells condor the names of these various files so that it can do the copying.
Instructions
- Download the job file, myscript.sh, and the Condor submission file, submit.txt to your local machine (e.g., your PC / laptop) and place them both in a folder called myfirstjob.
- Transfer the myfirstjob folder from your PC/laptop over to your home directory on the Condor submitter node (Instructions for Transferring files to and from Condor)
- Log in to the Condor submit node (Instructions for Connecting to Condor)
- At the command line, move into the myfirstjob folder with the command
cd myfirstjob
- Submit the job to the pool using the command
condor_submit submit.txt
- Under normal conditions, a job as small as this will execute almost immediately. So, after a few seconds take a look at the output files by running the
ls
command. You should see something like the following[zzaabbcc@submitter myfirstjob]$ ls err.txt log.txt myscript.sh out.txt submit.txt
Note the extra files err.txt, log.txt and out.txt. Notice also that these names were specified in the submit file (
submit.txt
). - If these files have not been created then it is likely that the Condor pool is busy and so your job has been queued. To see all jobs currently queued under your username, execute the following command (using your own username in place of
yourusername
)condor_q yourusername
which will show something similar to:
-- Schedd: submitter.itservices.manchester.ac.uk : <10.99.203.110:22854?... ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 64478.0 zzaabbcc 4/24 14:31 0+00:00:00 I 0 0.0 myscript.sh 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Jobs that are still waiting in the queue are labelled as “idle” and have an
I
in theST
(state) column.
Details of the output files
- log.txt This contains a log of events the job had during its lifetime inside of Condor. For example, it will tell you where the job got submitted from, which machine(s) it executed on and so on. Its filename is given by the
log
parameter in submit.txt - out.txt If you ran your job directly on a machine rather than using condor then it would probably send a lot of output to the screen. When you run it on condor, however, the output gets sent to this file instead. Its filename is given by the
Output
parameter in submit.txt - err.txt This file contains error messages (if any). Its filename is given by the
Error
parameter in submit.txt
Bursting into the cloud
As well as running HTCondor jobs on-site, we now have the ability to “burst” jobs into the Cloud – at the moment using Amazon Web Services (AWS). This is useful if you find your jobs are waiting in the queue for a long time. For a more detailed description of cloud bursting please see here
How to enable HTCondor Bursting
Currently all our Virtual Machines (VMs) spin-up in the AWS region us-east-2 (Ohio) . If your code is using or generating any sensitive data you should not use this service. |
To allow your HTCondor jobs to use this facility you need to add the following line to your HTCondor submission script:
+MayUseAWS=True
The submit.txt
script above would now look like this:
universe = vanilla executable = myscript.sh log = log.txt Output = out.txt Error = err.txt notification = error Request_Memory = 1024 Requirements = (OpSys == "LINUX" && Arch == "X86_64") when_to_transfer_output = on_exit +MayUseAWS=True queue