Memory and Disk Space
Background
Now that we use dynamic slots, almost exclusively, you need to be more aware of memory issues. The recommendations herein are mandatory.
With dynamic or partitionable slots you can request any amount of memory up to the total the client PC has. You must not do this by using the keyword Memory
in your Requirements line. Instead you should have a separate line like so:
# mmmm below is in megabytes Request_Memory = mmmm
What if I get it wrong?
It is sometimes difficult to know what to ask for: too much — and you could restrict the number of simultaneous jobs; too little — and if the Condor node exhausts all its memory then you will be evicted. You need to check log files for ImageSize after some test runs to give an idea. Note that ImageSize is reported in bytes, not megabytes.
If you under estimate your memory requirements, you may be lucky and Condor will accomodate you. However, what can, and does, happen is that your job gets evicted (for one of many reasons) and is re-queued. Condor, when looking for a new match, now knows the ImageSize but if this is more than you requested such jobs will stay stuck at Idle.
How to get my stuck jobs to run?
There are two ways:
- Remove the jobs, fix the Request_Memory line, and resubmit.
- Command line users can use the
condor_qedit
command (either directly or via our helper scriptset_memory
) to fix the memory request and immediately run the jobs (assuming free slots) as described below.
To use our helper script set_memory enter the following:
set_memory MemoryValue JobID
replacing
MemoryValue
with the new value for requested memory (in MB) and replacing JobID
with either the job number, the cluster number, or your user ID if you want to apply to all your queued jobs.
Alternatively use condor_qedit
as follows:
condor_qedit -n submitter.itservices.manchester.ac.uk Job.ID RequestMemory 64
Substitute as appropriate for Job.ID. The above asks for 64MB: change it to what you really need. If you don’t have a good idea of ImageSize, from say log files, you can try:
condor_q -global Job.ID -long | grep "^ImageSize "
Disk space
The above argument applies similarly to disk space, which if you want to make sure you have enough, you do:
# you have to give kkkkkk in kilobytes! Request_Disk = kkkkkkk