Upgraded CSF3 Info Hub
Background – 14th May 2025
We are in the process of upgrading the CSF3 from the SGE batch system, to the Slurm batch system.
We are also introducing new login nodes and a new scratch filesystem.
Since 7th April CPU and GPU hardware has been moving from SGE to Slurm in groups. Thus the amount of compute resource in SGE has been shrinking, and increasing in SLURM.
The team have done a lot of testing of the new setup, but it is impossible for us to test every piece of software or user scenario in advance. If you have complex pipelines, e.g. in python, you may have to do some re-installation.
Please DO check the documentation, and try things out, before logging a support ticket.
How To…
- How to login (and takes you through looking at your scratch and submitting a first job.)
- How to modify your SGE jobscript to use SLURM.
- How to copy files from Old Scratch to New Scratch
Known Issues and Workarounds
We will update the Known Issues and Workarounds page if problems arise. Please check that page before submitting a support ticket.
FAQ
I’m a brand new user and I’m not sure what to do
New users will initially access the “original” CSF3 running the SGE batch system. We recommend that you use the Getting Started section of our website, and in particular work through the batch tutorial as this will teach you the basics of using a system like the CSF3, and the SGE batch system in particular.
Once you have gained some confidence, try the upgraded system.
Is the SGE version of the CSF still running / available?
Yes! But not after 08:30 on Wed 21st May 2015 when
- All jobs that are running or queued in SGE will be deleted.
- No further job submissions to SGE will be possible.
- All compute nodes, including GPUs, remaining in SGE will be moved to Slurm.
- These resources are scheduled to come back on-line in Slurm during week commencing 26th May.
- The only compute node type not currently available in Slurm is the very high memory 128G per core (mem4000) node
- There is only one node of this type, access to it will continue to be restricted.
- If you have access under SGE you will have access again once it is ready in Slurm.
How much resource is in SGE and how much in Slurm?
You can find out what is available in SLURM by consulting the partition information (also shows the jobscript flags you can use.)
You can also see a summary of SGE and SLURM resources on the current configuration page (contains no information about how to submit work to either – it’s just the systems stats.)
Running ‘squeue’ shows very long Start Time, will it really take that long for my job to start?
The estimated start time shown when you run the ‘squeue’ command should not be considered accurate. This is because it is based on the Wallclock time requested by all of the jobs that are already queued and running for all users. Most jobs don’t need their entire Wallclock time requested – they finish successfully before the time limit, and some jobs may stop or crash prematurely if mistakes have been made in the jobscript, or code, or input data etc. For these reasons your job might start much sooner than the estimated time shown.
We encourage all users to submit jobs with reasonably accurate Wallclock times. If everybody submits jobs using the maximum permitted 7-day wallclock time, Slurm will not be able to calculate accurate start times for future jobs because it must assume that all jobs already in the system will need the full 7-days.