The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
Filestore
Filestore (Home Directories and Scratch)
There are several filesystems on the CSF within which users may keep their files. We give a brief introduction to the two most commonly used here. A more detailed description of the filesystems is given elsewhere in this documentation; most new users will use only their home and scratch directories:
Home
Each user is allocated a home directory. Home directories:
- Are the default location for your files and is your initial location on the CSF when you login. Windows users: think of this as like your My Documents folder.
- Are on resilient (backed-up) storage (deleted files may be recovered for up to 28 days).
- Are relatively small — typically a filesystem of 250GB or 500GB capacity is shared between a few dozen users (of the same research group). Some larger groups will have a bigger home space, but this is often shared by 100+ users.
- Should not be used to run batch jobs from (use scratch instead). This is important – please continue reading for more information.
- The following commands run on the login node command-line will all take you to your home directory from which ever directory (folder) you are currently in – these are handy if you become lost:
cd cd ~ cd $HOME
- To get the full path (location) of which folder you are currently in, run:
pwd
If requesting help or submitting a question about a job, for example, it is useful to supply this path name for the folder that contains the job files as part of your query.
Scratch
Each user is also allocated a directory within the scratch filesystem. The best way to consider this is as your main work-space. Scratch is:
- Recommended as the place to run batch jobs from. This is important – please continue reading for more information.
- Of much greater capacity than home-directories, so that more storage may be filled by each user.
- Of higher performance than home-directories, i.e., is more suitable for work which involves reading and/or writing large files.
- Suitable for only short-term storage of files (no backups are made) — ANY FILE OVER THREE MONTHS OLD MAY BE DELETED WITHOUT WARNING by the system. Users should ensure they have a copy of important files (input data, jobscripts, results) within their home area or other Research Data Storage area they may have access to. Files may also be downloaded from the CSF. Please see full details about the scratch policy.
- Not on resilient hardware (corrupted or deleted files are gone forever and if the hardware fails, all files are gone with no possibility of recovery!)
- The symlink (shortcut) in your home directory named
scratch
will take you to your scratch directory. For example, on the command-line use:cd ~/scratch
Additional Research Data Storage (RDS)
Some research groups may have additional research data storage (RDS) on the UoM central RDS system. This is visible on the CSF. If you have this storage:
- RDS is typically allocated to your research group, possibly for a particular research project.
- It must be requested by your PI/supervisor.
- The storage is backed-up and replicated just like your home directory.
- The storage will have a quota. All users are expected to keep their usage fair. The actual amount you can use depends how many users have access to this area.
- Typically each member of the group has a directory in the RDS area named after their username.
- You may have an additional shortcut in your home directory named
rds
ordata
for example. This will point to your directory within the RDS storage area. - We can set up a
shared
directory that all group members can read and write to. This is useful if you all want to use the same data and software – it saves everyone having their own identical copies.
Running Jobs
We ask that jobs be run from within your scratch directory. This has two benefits for your jobs:
- Some jobs create large result files (possibly larger than you were expecting) or large temporary files. If these are written directly to the home area it may fill up (home areas are relatively small and have a quota applied). This would cause your job to fail and also any other users’ jobs that are running in the same home filesystem! By running in scratch, which is huge and does not have a quota applied, you have much more room for your jobs. Once the job is finished you should copy only the results you need back to the home area.
- Scratch stroage is faster than home storage. If your job reads large input files files and/or writes large results (or temporary files) then running in scratch will reduce the overall time the job takes to complete.
To run a job from scratch, ensure the jobscript is in your scratch filesystem. Go in to the scratch directory and run the qsub
command from there. Ensure your jobscript contains the #$ -cwd
flag to make it run from the current directory. For example:
# Go to the directory where my jobscript is in scratch cd ~/scratch/experiments/water3 qsub my_simulation.sh
where the jobscript contains something like:
#!/bin/bash #$ -cwd # Job will run in the current directory #$ -V some_app.exe ./my_water_mol.dat
You should ensure you copy important results files back to your home or other Research Data Storage area if you have access to such storage.
Managing your files
We have a detailed FAQ which covers many aspects of managing your files.
We also have information about how to transfer files to and from the system.