Upgraded CSF3 Info Hub

Background – April 7th 2025

We are in the process of upgrading the CSF3 from the SGE batch system, to the SLURM batch system.

We are also introducing new login nodes and a new scratch filesystem.

Over the next few weeks existing CPU and GPU hardware will move from SGE to SLURM in groups. Thus the amount of compute resource in SGE will shrink, and increase in SLURM until all compute resources are in SLURM only.

The team have done a lot of testing of the new setup, but it is impossible for us to test every piece of software or user scenario in advance. If you have complex pipelines, e.g. in python, installed, you may have to do some re-installation.

Please DO check the documentation, and try things out, before logging a support ticket.

How To…

Known Issues and Workarounds

We will update the Known Issues and Workarounds page if problems arise. Please check that page before submitting a support ticket.

FAQ

I’m a brand new user and I’m not sure what to do

New users will initially access the “original” CSF3 running the SGE batch system. We recommend that you use the Getting Started section of our website, and in particular work through the batch tutorial as this will teach you the basics of using a system like the CSF3, and the SGE batch system in particular.

Once you have gained some confidence, try the upgraded system.

Is the SGE version of the CSF still running / available?

Yes! You can continue to use the SGE version of the CSF – e.g., to complete your project work. But it won’t be around forever. You should try your apps on the upgraded CSF sooner rather than later.

How much resource is in SGE and how much in Slurm?

You can find out what is available in SLURM by consulting the partition information (also shows the jobscript flags you can use.)

You can also see a summary of SGE and SLURM resources on the current configuration page (contains no information about how to submit work to either – it’s just the systems stats.)

Running ‘squeue’ shows very long Start Time, will it really take that long for my job to start?

The estimated start time shown when you run the ‘squeue’ command should not be considered accurate. This is because the time calculated is based on walltime requested for all the jobs that are already queued and running in the system for all users. Most jobs don’t need the entire walltime requested and some jobs will stop or crash prematurely due to various factors. For these reasons your job might start much sooner than the estimated time shown.

Last modified on April 25, 2025 at 1:24 pm by Abhijit Ghosh