Click [slideshow] to begin presentation.



An Introduction to Condor

Dr Simon Hood and Dr Jonathan Boyle

Research Computing Services


What is RCS?

Research Computing Services Support?


Research Computing Services, RCS

  • Specialist part of IT Services.

Contact Details
What is Research Computing?
  • Computing to support research! Examples:
    • running complex simulations;
    • performing vast parameter searches.


Research Computing Examples

High Throughput Computing (HTC)
  • Large amounts of comp. power over a "long" time:
    • Running long jobs!
    • Running the same experiment many (1000s) times, with different inputs.
High Performance Computing (HPC)
  • Large amounts of comp. power over a "short" time:
    • many CPUs simultaneously to run complex models quicker.
  • Many compute-nodes' RAM simultaneously to handle very big jobs.
Data Analysis and Visualization
  • Getting the information out of the vast quantities of data.


How Does, Can RCS Help You? Free Stuff

Free Stuff!

Provision of resources:
  • Horace, Man1, Man2, Mace01, Redqueen. . .
  • Condor pools; NW-Grid, NGS.
Administration of HPC/HTC Clusters:
  • Administer and support University, NW-Grid, NGS and some school and research group HPC clusters.
Support and Training:
  • Documentation — Web and Wiki.
  • Courses!
  • Usage of HPC/HTC (Inc. Condor) clusters,
  • application support,


How Can RCS Help You? In-Depth Support

In-depth support and collaborations

Free dedicated short-term help
  • Advice on parallelisation of code, or
  • advanced use of HTC (inc. Condor).
More in-depth help and collaborations
  • Optimising code/models: scoping, estimate, coding — dedicated resources may require funding.
  • Example: one year's dedicated effort extracting maximumum performance. Named resource/researchers on RCUK/EU etc. grants.


Other Related Courses

Introduction to Condor:
  • CPU-cycle scavenging and HPC cluster backfill;
  • Web pages;
Introduction to LaTeX:
Other Courses:
  • Introduction to OpenMP
  • Introduction to MPI
  • Fortran 95
  • Matlab
  • Image-Based Modelling

details and on-line booking. . .


This Course

Today's Course

  • Three speakers
    • Simon Hood (RCS) 10:00 – 12:00 approx.;
    • Jonathan Boyle (RCS) and Ian Cottam (EPS)


This Course: Part One

Simon (AM)

  • what Condor is;
  • how to use it — simple cases;
  • what Condor is good at and what it's not;
  • what EPS- and RCS-backed Condor facilities are available to you.

. . .con't. . .


This Course: Parts Two and Three

Jonathan (PM)

  • Using Matlab with Condor
  • Job control using Dagman
  • Job control and monitoring with BASH scripts

Ian Cottam

  • Condor and Dropbox


What is Condor?

From Wikipedia

  • Condor is a high-throughput computing software framework for coarse-grained distributed parallelization of computationally-intensive tasks.
    • ?


What is Condor, in English?

It can farm out computational work to idle desktop computers.
Runs on everything
Linux, Unix, Mac OS X, FreeBSD, and (even) MS Windows.
It can work as a traditional batch system
It can manage workload (jobs) on a dedicated cluster of computers (Beowulf) in place of SGE/LSF/PBS. . .
Can seamlessly integrate dedicated and other resources, e.g., Beowulfs, and (idle) teaching clusters and/or office desktop machines.
All types of jobs
Can schedule serial and parallel jobs.
On traditional HPC clusters. . .


Condor Philosophy



What is it good for?

Condor is Complementary to Traditional Batch Systems

  • Good for backfill and using "spare" CPU cycles.
  • Therefore, good for running jobs that can fill gaps flexibly.
  • So, jobs which individually do not require great resources, e.g., RAM or diskspace:
    • can run "anywhere";
    • can be checkpointed and migrated easily — requires re-linking.
  • Large numbers of small jobs, e.g., parameter sweeps, are ideal.

Sometimes better to use Condor, sometimes SGE. . .


Traditional Condor Pools


  • Use otherwise wasted compute cycles from non-dedicated resources:
    • individuals' office desktops;
    • teaching/public clusters.
  • Converts unused desktops into a distributed high-throughput computing (HTC) facility.
  • Minimal effect on desktop users:
    • Condor jobs start only after zero keybd/mouse input for, say, 15 minutes;
    • within seconds of keybd/mouse input, Condor jobs suspended.
  • All machines in the pool can submit jobs; all will likely run jobs; symmetrical, peer-to-peer topology.


Features of Condor

  • Condor machines are members of a pool.
  • Members can be compute nodes, submit nodes, or both — traditionally both.
  • Each pool has exactly one "head node" — the collector/negotiator.
  • Condor manages both resources (machines) and resource requests (jobs)
  • Transparent checkpoint/restart
    • and process migration (for some jobs)
  • Manages large numbers of (small) jobs well.


Using Condor: Overview

How do I get computation done with Condor?

  • Ensure your job is batch-ready — requires no user input, no GUI — just as for SGE/LSF/PBS. . .
  • Choose a universe — much more later.
  • Create a small text file which defines the Condor job (cf. qsub script).
  • Submit the job!
  • Monitor progress: output, error and log files.
  • Sit back with a nice mug of tea and enjoy the free CPU cycles.


So let's see it!

Demo holding page. . .


Command Summary

Display status of pool: number and type of machines; status of machines — owner/busy/idle; more. . .
Queue jobs for execution under Condor.
condor_q [-global]
Displays information about jobs in the Condor job queue; defaults to the local queue
Remove jobs from the Condor queue.


Command Help

condor_<command> -h|-help
    # ...lists all command-line args... 


Running a Job: Overview

Running a Job: Overview

In this module we look at the complete job cycle:

  • Make it batch-ready
  • Choose a Universe
  • Create a submit file
  • Submit the job
  • Monitor your job's status


Universes and Job Examples

In this module. . .

  • detail the most commonly used universes in Condor
    • Vanilla, Standard. . .
  • give example Condor submission scripts for each.


Data and File Transfer Summary

In this module. . .

  • Summarise use of remote IO and shared filesystems in Condor.
  • Outline how to explicity transfer required input and output files.


Class Ads

In this module. . .

  • What class ads are
    • workstation resource ads
    • job ads
  • Class ad matching
  • Debugging via -better-analyze


RCS Pools, Backfilling Dedicated HPC Systems

In this section we:

  • how Condor can "backfill" traditional HPC clusters;
  • Condor facilities offered by RCS.


Condor and Grid Computing

In this module:

  • we define what we mean by grid;
  • outline (only) how Condor can help with grid computing.


Installing Condor

In this module we outline:

  • where to get the software from;
  • how to set up a Linux machine to join a Condor pool;
  • and how to set up a Condor pool from scratch.


Networking, Topology and Firewalls

In this module:

  • [Placeholder]


Condor and Matlab
Or simply use nodes with a shared filesystem?

In this module:

  • how Condor can "backfill" traditional HPC clusters;
  • Condor facilities offered by RCS.