Home, Scratch, RDS
Storage Areas
There are several filesystems (storage areas) on the CSF for your files. We give a brief introduction to the three most commonly used. They are home, scratch and additional research data storage (RDS).
Each filesystem has different performance characteristics, backup policy, retention period and is usually provided by a distinct physical storage system. These properties are tuned for the intended use of the storage.
Most new users will use only their home and scratch directories initially.
Home
Each user is allocated a home directory. Your home directory:
- Is the default location for your files and is your initial location on the CSF when you login. Windows users: think of this as like your My Documents folder.
- Is on resilient (backed-up) storage – deleted files may be recovered for up to 35 days. This uses the central Research Data Storage system (aka Isilon). If you have access to other Research Infrastructure platforms (e.g., the iCSF, Zrek or rds-ssh) then you will see exactly the same home directory as on those systems!
- Is relatively small — typically a filesystem of 250GB or 500GB capacity is shared between a few dozen users of the same research group. Some larger groups will have a bigger home space, but this is often shared by 100+ users. Please don’t use up all of your group’s allocated home storage – your colleagues won’t be happy.
- Should not be used to run batch jobs from (use scratch instead). This is important – please continue reading for more information.
- Should be used to keep a copy of important files (because the storage is backed up). This includes jobscripts, source code (e.g. if you write your own software), downloaded open-source applications, small input data files, small results files. If you have large input data / results files please look at the research data storage (RDS) description below.
- Any of the following commands, run on the login node, will all take you to your home directory from which ever directory (folder) you are currently in – these are handy if you become lost:
# These all return you to your home directory from any location cd cd ~ cd $HOME
- To get the full path (location) of which folder you are currently in, run the
pwd
command:# Running 'pwd' after logging in shows your home folder path. # You can run 'pwd' in any folder to see the full path name (useful when requesting help.) [mabcxyz1@hlogin1 [csf3] ~]$ pwd /mnt/iusers01/xy01/mabcxyz1 # # # # The last part of the home folder path is always # # named after your username. This folder is private # # to you - only you can access it. # # Part of the path indicates your group code. # A quota is applied to this group folder, to be shared amongst # all users in your group. If you use too much space you may # prevent your colleagues from saving their files!!
If requesting help or submitting a question about a job it is useful to supply this path name of the folder that contains the job files:
- The CSF command-prompt gives you a hint about which directory you are currently in:
[mabcxyz1@hlogin1 [csf3] ~]$ cd mydata/sample1/ # # # # 2. Use the 'cd' command to change directory # # 1. Current location: ~ means your home directory [mabcxyz1@hlogin1 [csf3] sample1]$ # # 3. The prompt now shows the last component # of your new location
Scratch
Each user is also allocated a directory within the scratch filesystem. The best way to consider this is as your main work-space from where you can run jobs. Scratch is:
- Recommended as the place to run batch jobs from. This is important – please continue reading for more information.
- Of much greater capacity than home-directories, so that more storage may be used, temporarily, by each user.
- Faster than home directories, i.e., is more suitable for work which involves reading and/or writing large files (hence your jobs will complete sooner). The storage is local to the CSF only (for performance) and cannot be seen on other Research Infrastructure platforms.
- Only suitable for short-term storage of files. No backups are made — ANY FILE NOT USED (read or written to) FOR AT LEAST THREE MONTHS MAY BE DELETED WITHOUT WARNING by the system. Users should ensure they have a copy of important files (input data, jobscripts, results) within their home area or other Research Data Storage area they may have access to. Files may also be downloaded from the CSF.
- Not on resilient hardware – corrupted or deleted files are gone forever and if the hardware fails, all files are gone with no possibility of recovery! This is why you should copy important files (e.g., results files) back to your home area.
- The symlink (shortcut) in your home directory named
scratch
will take you to your scratch directory. Any of the following commands can be used to access your scratch area:cd ~/scratch # Reminder: ~ is the linux shorthand for 'your home directory' cd /scratch/username # This is itself a shortcut and where the above shortcut points to
- To see where your
scratch
shortcut actually points to:[username@hlogin1 [csf3] ~]$ ls -l scratch lrwxrwxrwx 1 username zz01 17 Aug 30 2018 scratch -> /scratch/username # # # # Location the shortcut points to # # Name of shortcut
- To see how much scratch space you are using and how many files you have stored there, run the following command on the login node:
scrusage # Example output Your scratch usage: 450.7GB, 84,053 files # # This user is using 450 GB of scratch space (via 84,053 files). # This is a lot. They should delete some files!
Additional Research Data Storage (RDS)
Some research groups may have additional research data storage (RDS) on the University’s central RDS system. This is visible on the CSF. If you have this storage:
- RDS is typically allocated to your research group, possibly for a particular research project.
- It must be requested by your Principal Investigator(PI) / supervisor.
- The storage is backed-up and replicated just like your home directory. It uses the central Research Data Storage system (aka Isilon) and so the storage can also be made available on other Research Infrastructure systems.
- The storage will have a quota. All users are expected to keep their usage fair. The actual amount you can use depends how many users have access to this area. But it will usually be much larger than your home directory. Hence you can store large data files and results in this additional storage.
- Typically each member of the group has a directory in the RDS area named after their username.
- You may have an additional shortcut in your home directory named
~/rds
or~/data
for example. This will point to your directory within the RDS storage area. We can create one for you or you can create a shortcut. For example, if you have ards
shortcut, you can access your extra storage using:cd ~/rds
- If you don’t have a shortcut (but you have some additional Research Data Storage) then you can create a shortcut yourself:
ln -s /mnt/path/to/rds/username ~/rds # # # # You can choose the name of your shortcut - here we use "rds" # # The ~/ means the shortcut will be in your home directory. # # You'll need to know the full path to your extra RDS folder
- To see where your
rds
shortcut actually points to:[username@hlogin1 [csf3] ~]$ ls -l rds lrwxrwxrwx 1 username zz01 17 Aug 30 2018 rds -> /mnt/fac01-rds/bloggs-bigproj/username # # # # Location that the shortcut points to # # Name of shortcut
- We can set up a
shared
directory that all group members can read and write to. This is useful if you all want to use the same data and software – it saves everyone having their own identical copies.
Running Jobs
We ask that jobs be run from within your scratch directory. This has two benefits for your jobs:
- Some jobs create large result files (possibly larger than you were expecting) or large temporary files (that you may not even be aware of). If these are written directly to the home area it may fill up (home areas are relatively small and have a quota applied). This would cause your job to fail and also any other users’ jobs that are running in the same home filesystem! By running in scratch, which is huge and does not have a quota applied, you have much more room for your jobs.
- Scratch storage is faster than home storage. If your job reads large input files files and/or writes large results (or temporary files) then running in scratch will reduce the overall time the job takes to complete.
Please note: Once the job is finished you should copy the results you need back to your home area or additional RDS area. Files in the scratch area can be deleted by the system once they are 3-months old and there is NO backup. DO NOT use scratch as long-term storage for your important files!
To run a job from scratch, ensure the jobscript is in your scratch filesystem. Go in to the scratch directory and run the qsub
command from there. Ensure your jobscript contains the #$ -cwd
flag to make it run from the current directory. For example:
# Go to the directory where my jobscript is in scratch - this becomes your current dir cd ~/scratch/experiments/water3 # Now submit the job qsub my_simulation.sh
where the jobscript contains something like:
#!/bin/bash --login #$ -cwd # Job will run in the current directory module load category/app/1.2.3 # Run my app. It will read myinput.dat from the current directory some_app.exe myinput.dat
You should ensure you copy important results files back to your home or other Research Data Storage area if you have access to such storage.
Managing your files
We have a detailed FAQ which covers many aspects of managing your files.
We also have information about how to transfer files to and from the system.