The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
More Details and Policies
Home directories
This is the directory where you first login to and it is where you should keep your most important files. Temporary files for running jobs should go in scratch (see below for more about that filesystem).
Users should not run jobs from their home directory, scratch is provided for this. Jobs which run in home are at risk of failure if the limited home space runs out. Scratch is also a high performance filesystem designed for running jobs. The home filesystem can become strained by jobs running it in.
Isilon Home-Directories (i.e., /mnt/iusers01/xy01
)
Home directories:
- All home directories are on Isilon-based storage. These are named
/mnt/iusers01/ab01/username
(where
ab01
refers to your usergroup, e.g.,mace01
orct01
andusername
is your login name). - Linux automatically sets the variable
$HOME
to what ever is your home directory path.
Quotas and Disk Space Usage
There are no quotas operating on the home filesystems (in the traditional Unix/Linux sense). However, each group directory (e.g., /mnt/iusers01/mace01
) has a directory limit placed on it. To see this limit and the amount of space used/available use the command df -h $HOME
,
for example
[username@login1 ~]$ echo $HOME /mnt/iusers01/zz01/username [username@login1 ~]$ df -h $HOME Filesystem Size Used Avail Use% Mounted on nas.isilonr.manchester.ac.uk:/ifs/nas/reynolds/research/NONFAC/NFS/snapped/replicated/csf-users01 800G 611G 190G 77% /mnt/iusers01
This indicates that the group zz01
has 800 GB of storage allocated to it, of which 611 GB is used and 190 GB is available. (In this example rounding of reported figures has led to the apparent arithmetic error.)
To find out how much of the allocation you are using, from your home directory, run:
[username@login1 ~]$ du -sh
To see which of your files or directories in the current directory are biggest:
[username@login1 ~]$ du -sh *
Backups
These filesystems are part of the IT Services Research Data Service (RDS) and are snapshotted and replicated, i.e., the storage can be considered resilient and files which are accidently deleted or corrupted can be recovered (for up to 28 days).
Scratch space
This is a globally visible temporary file system which has a much higher capacity than the home filesystems and is effective at storing large files required for batch jobs or produced by them.
Scratch is a high performance Lustre filesystem.
Every user has a symlink from their home directory to their scratch directory – $HOME/scratch
. You can access the directory simply by typing
cd scratch
and creating your job related files in your usual way. You can also transfer files directly into scratch from your desktop using your usual method (scp, winscp, etc).
If you need to move or manipulate large data files it is recommended that you use the absolute path /scratch/$USER
for better performance.
The simplest way to submit a batch job from scratch is to setup the required directory and files under scratch and then issue your qsub
from within that directory.
Please ensure you tidy up your files in scratch regularly, e.g. soon after jobs have run. The space must be shared by everyone. Failure to tidy up will result in the clean-up policy detailed below being applied to your account.
The lustre filesystem can handle many millions of files spread throughout the filesystem, but individual directories are protected by a single metadata lock each. Hence when many thousands of files are stored in the same directory, you get contention for that single lock. We usually suggest that users keep fewer than 10,000 files in a single directory, and ideally far fewer e.g. 1000.
This filesystem is not backed up. It is too large and intended for temporary files only. Please do not keep thousands/millions of files or large quantities of data here for long periods – note the scratch policy below.
Use the du
command to determine your scratch usage.
Scratch clean-up policy
This policy is enforced – plesase ensure you keep important files in a location other than scratch. Groups requiring long term storage of large amounts of data or lots of files should ask their PI to request an allocation of space on the Research Data Storage service (RDS) and for it to be made available on the CSF.
The October 2011 CSF User Group accepted that there is a need for an automated clean-up of user files in /scratch
because:
- this area must be shared by everyone with access to the cluster
- 168TB may seem a lot of space, but with a large number of users it can easily fill up
- if the filesystem fills up it can cause jobs to fail
- implementation of quotas is possible, but would hamper performance
It was agreed that files which were three months old would be automatically deleted. Users are expected to regularly monitor their usage of scratch and carefully consider which files need to be retained and delete those which are no longer required. Unfortunately, the size of the filesystem and University email limits makes it impractical to send warning messages.
Users may use Linux utilities to retain files beyond the 3 month limit, but this will be monitored and those deemed to be retaining files for excessive periods, or using more than what might be considered a fair amount of disk space will be contacted and ask to tidy up their data.
Please remember that scratch is not backed up, so long term usage of this area for storage is highly risky. It should also be noted that if you untar a tarball of files, for example e.g. some source code or archive of files from an old project, which contains files with date stamps older than 3 months these may be removed without warning.
If you need to keep files outside of your home-directory for more than three months, please contact its-ri-team@manchester.ac.uk or ask your PI to request an allocation of spce on the Research Data Storage service (RDS).
Local /tmp
Each compute node has approximately 400GB of space in /tmp. This disk is not globally accessible to other batch nodes or the login node so batch jobs which make significant use of the local disk will need to copy data out of this area at the end of the job.
Efficient use of disk space
Home filestore is limited. To make good use of your groups disk space please consider using gzip
or bzip
to compress the files and this make less use of the disk.
Use the du
command to determine your usage.
Group-Specific Filesystems on the Research Data Storage service
Some groups have been allocated space on the central Research Data Storage service (RDS), also referred to as Isilon. The naming of the RDS filesystem when mounted on the CSF reflects what type of resilience, if any, that filesystem has.
These filesystems are not recommended for running jobs from – you should continue to use the main lustre based scratch for that.
Data Areas (e.g. xy01-data01
)
A copy of all files and directories is made every hour and these copies are kept for 24 hours; a copy is also taken once a day and these are kept for 28 days. These copies are known as snapshots.
If a file is deleted by accident or corrupted, it can be recovered by using these snapshots.
If data turnover is high, snapshots will take up a significant proportion of a share and hence reduce the available space for current files.
Very small files take up a disproportionate amount of space on the RDS, it is therefore strongly recommended that you tarball these together.
Home Areas (e.g. xy01-home01
)
The same as ‘Data Areas’ with resilience added to the storage by copying it nightly to a second set of storage hardware (in a different physical location). This is known as replication.
Not necessarily used as home directories. Some, all or none of the users from a group may be using a ‘local’ CSF filesystem as home space (labelled usersxx
).
Usually used as an area for storing very important files.
Your PI will determine how they wish to use this space.
Checking available space
Use the df
command to see your groups filesystem. For example:
[username@login2 ~]$ df -h | grep 'Used|xy01-data01' Filesystem Size Used Avail Use% Mounted on 16T 5.7T 11T 36% /mnt/xy01-data01
And the du
command to determine your usage of it.
Backups and file recovery
Isilon RDS filesystems including home directories
The following backup policy is in place on Isilon home and RDS areas:
- Snapshots are taken every hour and kept for 24 hours>
- A full backup is taken each day and kept for 35 days>
You can recover files yourself. Every directory on isilon has a ‘hidden’ directory called .snapshot
. Move to that directory and you will see directories for various dates and times. Choose the one closest to before you deleted the file, and then you should be able to find a copy of the file you have deleted. For example, you have deleted a directory called ‘codes’ from the top level of your home directory:
cd .snapshot ls cd Reynolds_NONFAC_NFS_24hr_2014-01-23_16-30-00 ls cp -Prx codes/ ~
A big advantage of isilon is that if you only created the files an hour or two before you deleted it there is a high chance of getting it back.
If the deletion occurred longer than 35 days ago, but you perhaps did not notice it had happenned, then it may not be possible to recover the file.
Scratch file recovery
There is no backup so it is not possible to recover files you accidentally delete from here.