Upgraded CSF3 Info Hub

The upgraded CSF is the CSF

As of Tuesday 27th May 2025, the old CSF (SGE) cluster is no longer available. The csf3.itservices.manchester.ac.uk address will login to the CSF (Slurm) cluster.

We are in the process of updating our applications documentation to provide Slurm jobscript examples (replacing the SGE jobscripts.), but there are hundreds of pages to do. If you see a jobscript on a webpage is using #$ then that is SGE, and you should consult the docs below on how to modify such jobscripts to use Slurm.

The team have done a lot of testing of the new setup, but it is impossible for us to test every piece of software or user scenario in advance. If you have complex pipelines, e.g. in python, you may have to do some re-installation.

We want to thank everyone who has transitioned to the upgraded Slurm system in the last 2 months and provided us with questions and feedback.

Please DO check the documentation, and try things out, before logging a support ticket.

How To…

  • How to login (and takes you through looking at your scratch and submitting a first job.)
  • How to modify your SGE jobscript to use SLURM.
  • How to copy files from Old Scratch to New Scratch
  • You are REQUIRED to decide which files you want to retain from your OLD SCRATCH, then COPY THEM to your NEW SCRATCH.WE WILL NOT BULK TRANSFER YOUR OLD SCRATCH FILES TO THE NEW SCRATCH. AND NOR SHOULD YOU! Please copy ONLY WHAT YOU NEED (spring-clean your scratch!)
    IF YOU DO NOTHING, YOUR OLD SCRATCH FILES WILL (EVENTUALLY) BE LOST FOREVER WHEN WE SWITCH OFF THE OLD SCRATCH HARDWARE!!! From 08:30 Wed 4th June old scratch will no longer be accessible.

Known Issues and Workarounds

We will update the Known Issues and Workarounds page if problems arise. Please check that page before submitting a support ticket.

FAQ

Where has SGE gone, why is everything talking about Slurm?

SGE has served us well, but Slurm offers features and further developments that we need, to continue the CSF service.

As of Wednesday 21st May 2025, all jobs in SGE have been terminated and no further jobs will be accepted. All compute hardware has moved to the Slurm cluster.

You should be using the Slurm cluster from now on!

I’m a brand new user and I’m not sure what to do

We recommend that you use the Getting Started section of our website, and in particular work through the batch tutorial as this will teach you the basics of using a system like the CSF3, and the Slurm batch system in particular.

Is the SGE version of the CSF still running / available?

No, it is not possible to login to this cluster any more.

What is the login address of the Slurm cluster

As of 27th May 2025, you can use the familiar address:

csf3.itservices.manchester.ac.uk

This will land you on one of three identical login nodes.

How much resource is in SGE and how much in Slurm?

As of 08:30 Tue 27th May, it is no longer possible to login to the CSF (SGE) system (and there are no compute nodes in that system.)

You can find out what is available in SLURM by consulting the partition information (also shows the jobscript flags you can use.)

Detailed information about the CSF3 hardware can be found in the current configuration page (contains no information about how to submit work to either – it’s just the systems stats.)

Running ‘squeue’ shows very long Start Time, will it really take that long for my job to start?

The estimated start time shown when you run the ‘squeue’ command should not be considered accurate.

This is because it is based on the Wallclock time requested by all of the jobs that are already queued and running for all users. Most jobs don’t need their entire Wallclock time requested – they finish successfully before the time limit, and some jobs may stop or crash prematurely if mistakes have been made in the jobscript, or code, or input data etc. For these reasons your job might start much sooner than the estimated time shown. Also, at times you might see that the estimated start time of your job is getting extended. This can happen for various reasons, one among which is to enable users who have not submitted or ran any jobs recently or for long time to get their jobs running earlier that other users who have been actively running many jobs or some jobs with many cores.

We encourage all users to submit jobs with reasonably accurate Wallclock times. If everybody submits jobs using the maximum permitted 7-day wallclock time, Slurm will not be able to calculate accurate start times for future jobs because it must assume that all jobs already in the system will need the full 7-days.

Last modified on June 2, 2025 at 8:23 am by Pen Richardson