Scratch Cleanup Policy

Updated scratch policy (1st July 2021): Any old, unused files – which means files that you, or your jobs, have NOT read or written to within the last 3 months – will be deleted.

The previous policy was to delete only files that had not been written to within the last three months. The new policy means that if, for example, your jobs are only reading old data-sets, then those data-sets will NOT be deleted. This new policy will delete fewer files than the previous policy.

This policy is enforced – files the have not been read or written / updated within the last 3 months in your scratch area will be deleted automatically by the system.

Please ensure you keep a copy of important files (e.g., your results!) in a location other than your scratch area (~/scratch). Your home folder is suitable for small/medium files. Groups requiring long term storage for large amounts of data, or lots of files, should ask their PI/Supervisor to request an allocation of space on the Research Data Storage service (RDS) and for it to be made available on the CSF.

There is NO backup of scratch. If you, or the scratch-tidy, deletes files then they are gone forever!

Why is there a scratch clean-up policy?

When the original CSF was first commissioned, the October 2011 CSF User Group accepted that there is a need for an automated clean-up of files in /scratch. This policy still holds – your scratch area will be cleaned-up automatically!

  • This area must be shared by everyone with access to the cluster.
  • There are no quotas on your usage because quotas reduce performance.
  • 1PB may seem a lot of space, but with a large number of users running 1000s of jobs it can easily fill up.
  • If the scratch filesystem fills up, most CSF jobs will fail!
  • It strongly encourages you to copy important results to more resilient storage! If the scratch filesystem (e.g., the disks) fail there are NO backups!

What does the scratch clean-up policy mean for me?

Please ensure you aware of the following:

  • Files which have not been read or written to in the last three months will be deleted automatically.
  • Files which have not been read or written to in the last three months will be deleted automatically.
  • Files which have not been read or written to in the last three months will be deleted automatically.
  • Please note: Users are still expected to regularly tidy up their own scratch area – don’t just wait for the scratch tidy to do your storage housekeeping. Delete unwanted files and copy important files to home or additional research data storage.
  • Unfortunately, the size of the filesystem and University email limits makes it impractical to send warning messages.
  • There is NO backup of scratch. If you, or the system, deletes files then they are gone forever!
  • In the rare event of scratch hardware failure, we cannot recover scratch files. So by having a clean-up policy we are strongly encouraging you to ensure you have copied your important files to backed-up storage (e.g., your home storage area or additional research data storage).
  • Please note: user found deliberately circumventing the scratch clean-up policy, thereby using scratch for long-term storage, will have their CSF account suspended.

Please read the brief summary of the three CSF storage areas available to you. It is important you understand where your files are stored and the different properties of that storage.

What is my current scratch usage?

Run the following command on the login node to find out:

scrusage

Example output

Your scratch usage: 450.7GB, 84,053 files

Timestamps when extracting tar and zip archives

If you have tar or zip archives (for example downloaded source code or downloaded datasets/databases or archives of files from an old project), they may contain files with date stamps older than 3 months. Extracting the files from the archive will usually result in the original data stamp being applied to the files. These files may then be removed without warning because the files considered too old.

To extract the files with today’s date applied to them, which then gives you 3-months, use the following commands:

# Extract a gzip compressed tar file, writing files with today's date (add m flag)
tar xzmf my_archive.tar.gz

# Extract a zip archive, writing files with today's date (add -D flag)
unzip -D my_archive.zip

Timestamps when transferring files from your PC to your scratch area

If you upload files from your PC/laptop to your scratch area, the following rsync command can be used on Linux PCs/laptops, Macs and in MobaXterm on Windows. It will ensure that the files are written in scratch with today’s date-stamp. This will give the files the full 3 months before the scratch tidy can delete it:

rsync -rlvz myinput.dat username@csf4.itservices.manchester.ac.uk:~/scratch/

The -rl flag replaces the more usual -a flag when doing this type of transfer.

Additional Research Data Storage

If you need to keep files outside of your home-directory for more than three months, please contact its-ri-team@manchester.ac.uk or ask your PI to request an allocation of sapce on the Research Data Storage service (RDS).

Last modified on January 15, 2024 at 5:39 pm by George Leaver