Cloud Bursting with AWS

We have now configured the Condor Pool to “burst” into the Cloud, specifically Amazon Web Services (AWS), if the queues for the Condor Pool are full. This means that your jobs will run on the Cloud rather than on the Condor Pool.

Currently all our Virtual Machines (VMs) spin-up in the AWS region us-east-2 (Ohio). If your code is using or generating any sensitive data you should not use this service.

How to enable HTCondor Bursting

To allow your HTCondor jobs to use this facility you need to add the following line to your HTCondor submission script:

+MayUseAWS=True

Detailed Description

If you have enabled cloud bursting for your HTCondor jobs any tasks which have been idle for 20 minutes will be considered eligible for bursting into AWS. Virtual Machines (VMs) will start to spin-up in batches of 50, until either you have no further jobs in the queue or you are using 400 CPUs. Your jobs will run in VMs in much the same way as they would on-site. When the jobs finish output files will be transferred back to submitter.

VM Details

We currently make use of m5.large instances, these have 2CPUs and 8 GB RAM. Dynamic Slots are enabled so you will be able to make use of both CPUs even if your code is serial.

Instance Details

We are using AWS Spot Instances which, in simple terms, are the spare capacity in AWS at any given time. This is a very cost-effective way of using public Cloud platforms although there is a very small chance the resources are reclaimed by AWS if they are no longer spare. You need not worry about this however, the HTCondor queue will take care of resubmitting evicted jobs.

Last modified on July 23, 2019 at 10:51 am by Chris Paul