Services News

Important news and announcements — including information about outages — are given immediately below for all RI Team services. Such information will also be disseminated via user email lists, where appropriate.

Details of system changes and updates about ongoing issues for each RI Team service are posted further down the page, in the appropriate section.

RI Support – how to get help

Please note that the Research Infrastructure Team members are working in a hybrid manner, a mixture of time in the office and at home.

For information on how to submit an enquiry about or a request for help with CSF, iCSF, RDS, condor and related services please see our Help page.

Accessing our services from Off Campus

Please see our dedicated guide.

Current RI Services’ Status

CSF3 login2 scratch

July 26th 10:25 – RESOLVED

You can resume logging in to csf3.itservices.manchester.ac.uk , if you land on login2 it is now ok. Please note that ssh to specific login nodes is normally not encouraged, the above address will automatically determine which one to connect to and it helps balance the load across them.

July 26th  08:30
The scratch filesystem is currently unavailable on login2. We recommend logging in to login1-csf3.itservices.manchester.ac.uk until this is resolved. Please monitor this page for updates.

July 25th 17:50
The scratch filesystem is currently running slow on login2. We recommend logging in to login1-csf3.itservices.manchester.ac.uk until this is resolved.

Research Data Storage (Isilon) Maintenance – 27th July 2024

There will be a planned upgrade of the RDS/Isilon storage software on Saturday 27th July 2024.

This maintenance impacts only your connections to the storage system – your files and data are NOT at risk.

The service should be considered at risk for the day.

Expected impact to services and users:

  • Compute platforms:
    • Login sessions and jobs running on the central HPC platforms (e.g., CSF3/4, iCSF) may pause but will resume once the maintenance has taken place.
    • If using these platforms you may wish to check on your jobs *after* the maintenance period, but no further action should be required.
  • Condor:
    • The condor service uses this storage for home directories.
    • Login sessions on submitter may ‘pause’ for a short time, but should recover and continue to run normally.
    • Transfer of files from submitter to jobs and back again may be interrupted.
  • PCs:
    • Desktops, laptops & lab-equipment PCs that map additional ‘network drives’ similar to:
      \\nasr.man.ac.uk\facrss$\snapped\replicated\myshare

      may need to remap the drive. Further instructions are available at: Accessing Storage FAQ.

  • RDS-SSH service
    • File transfers may ‘pause’ or timeout. Login sessions may ‘pause’ for a short time, but should recover and continue to run normally.
    • CIFS shares similar to:
      \\nasr.man.ac.uk\facrss$\snapped\replicated\myshare

      on RDS-SSH may need to be remounted – users can do this themselves by running mount-my-cifs once logged in to the rds-ssh server.

  • Research Virtual Machine Service:
    • Login sessions and services running on VMs may be interupted.
    • You may need to check your VM *after* the maintenance and restart some of your VM services.

Please note that this does not affect P-Drives or Shared Areas.

If you have any questions please let us know by emailing: its-ri-team@manchester.ac.uk

Any relevant updates on the maintenance will be posted above.

CSF3 Maintenance Summer 2024

Essential maintenance is taking place to CSF3 over the summer. Some aspects of the work will involve temporary removal of nodes to permit updates. Some nodes will be retired. New nodes will be installed.

Reduced Capacity

The maintenance work means that capacity of the system will vary during this period.

But we aim to always have available some of a particular node type (e.g., there will always be some v100 GPU nodes available, there will always be some A100 GPU nodes available, there will always be some high-memory nodes available, and so on.)

We will be actively monitoring the queues and will try to ensure that everyone can still run some work by adjusting global limits as necessary.

If you use qrsh, please note that resources for interactive work are always quite limited (regardless of maintenance) and the system prioritises batch jobs. You are strongly encouraged to determine how you can do your interactive work as batch jobs instead – this is a far more efficient way of working (your batch jobs can be scheduled to run 24-hours a day, once they’re in the queue.)

Hence, you should continue to submit jobs as normal. You may find that queue times are longer, but your jobs will be scheduled to run and will eventually complete. The sooner you put your jobs in the queue the sooner they can be selected to run.

You do not need to ask us whether a particular resource is available. Please just put your jobs in the queue as normal. Remember: if you jobs are not in the queue, the system will never run them!

Details of specific node types / resources are give below:

CSF3 – 24th July 2024 – mem256 and mem1024 nodes retired.

See the CSF3 High Memory documentation for alternative high-memory resources. Please note that anybody can now access the mem1500and mem2000 nodes – you do NOT need to request access to them.

HPC Pool – downtime for maintenance 24th July (COMPLETED)

25th July: HPC Pool is now available again.

Issues with log in to Research Infrastructure Services

Can’t Login? “Account Locked” Message?Following the recent University-wide password reset, a number of users have been unable to login to the CSF, receiving an “account locked” message.

This is NOT a problem with the CSF. It is caused by your central IT user account being locked.

A common cause is that a mobile app or laptop/desktop app is still using your old password. For example, if you didn’t log out of an email app on your phone before changing your password, it could still be using your old password. You’ll need to sign out of the app and sign in again.

If you saved your password in an app such as MobaXterm, VSCode or WinSCP, then it may still be trying to login with your old password. You’ll need to remove the password from these apps (we recommend that you don’t save passwords in apps.)

You will need to contact the IT Help Desk to have your IT account unlocked – Research IT CANNOT do this for you.

[Go back]


Jump to: CSF Updates | iCSF Updates | RVMS Updates


CSF Updates

Permanent notice: Scratch files over THREE months old being deleted

Scratch is not intended for long-term storage of files. Files not accessed for THREE months or more are deleted. All users should regularly tidy up their scratch files to avoid data loss. Full details about the policy.

If you have any questions about this policy, please contact us via the HPC Help form

[Go back]


iCSF Updates:

[Go back]

None.


RVMS Updates:

None.

[Go back]


Last modified on July 26, 2024 at 9:27 am by Pen Richardson