User FAQ

If you have a question not covered in the sections below please contact us via its-ri-team@manchester.ac.uk providing as much information as possible about the query.

For questions about contributing to the CSF please see the Contributor FAQ.

 

FAQs

Getting an account

  1. Can I have an account?
  2. I’ve been asked to get an account on the CSF. What do I do?
  3. I’ve been given a free at point of use (fatpou01) account. Why?

Logging in

  1. I’ve forgotten my password, have tried to log in several times and now I can’t even get ssh or PuTTY to connect to the CSF. What should I do?
  2. I’m running GlobalProtect while off campus but can’t log in to the CSF. What should I do?
  3. It doesn’t seem to accept my password. Why?
  4. I now get a “connection refused” error. Why?
  5. I’ve been locked out of my account. Why?
  6. File transfers are not working in MobaXterm. How can I fix this?

Running Jobs, Jobscripts, Modulefiles

  1. I’m new to the CSF and batch computing – is there a quick tutorial?
  2. Can I quickly run my code or application on the login node – it shouldn’t take long so writing a jobscript seems like a lot of effort?
  3. My job appears to be stuck in the queue and is not running. Why?
  4. My job can’t read my input files. Why?
  5. Should I load modulefiles first then submit a job or load the modulefiles in the jobscript?
  6. I get /bin/sh and module errors in my batch jobs, what should I do?
  7. What is the maximum runtime for a job?
  8. Can I get an email when a job starts, finishes or aborts due to an error?
  9. Can I submit another job from within a jobscript?
  10. Can I make one job wait for a previous job to finish?
  11. I’m getting a display error. I don’t know how to fix it.
  12. Why can I not use watch to monitor qstat?
  13. My job has almost reached the runtime limit, please can you extend the running time on it?
  14. My job is running much slower than I expect, why?
  15. I get host is not a submit host when running qsub. Why?
  16. I get module: command not found and my job fails. Why?
  17. My queued jobs have 0.00000 priority, is there something wrong?
  18. My job failed with a /bin/bash: --login: invalid option error. Why?

Software and Applications

  1. How do I check if an app / piece of software is installed?
  2. The software I was using on CSF2 is not installed on CSF3. What should I do?

Compiling Software

  1. What Does the error forrtl: severe (40): recursive I/O operation, unit , -1 file unknown mean when compiling my Fortran code?
  2. ifort gives me an error: cannot find -lm. How do I fix that?

Windows Users

  1. I’m used to windows, not Linux. How do I access and use the CSF?

Files and Filesystems

The questions below are just a few of the common ones we get asked. We have a longer and more detailed FAQ on this topic.

  1. How do I download something from an external site or access repos such as github from the CSF?
  2. My job can’t read my input files. Why?
  3. I’ve deleted a file. Can you get it back for me?
  4. I’ve got 1000s of files in scratch I want to download. What’s the best way?
  5. How can I free up some space in my home or scratch area?
  6. Some of my scratch files have been deleted! Where have they gone?
  7. I have downloaded a .zip of several datasets but the scratch clean-up keeps deleting them. What can I do?
  8. How do I find out my current scratch usage (space and number of files?)
  9. File transfers are not working in MobaXterm. How can I fix this?

Requesting Further Help

  1. I need some help, who do I contact?
  2. How can I check on current system status, how any maintenance is progressing or check if a problem has been noticed?
  3. I have a Support Centre ticket but it’s been closed! Why?

Answers

Getting an account

Can I have an account?

Yes! Please contact us with the info requested on our User Accounts page.

I’ve been asked to get an account on the CSF. What do I do?

Please contact us with the info requested on our User Accounts page.

I’ve been given a free at point of use (fatpou01) account. Why?

In short, this means your PI, research group, school/department or faculty have not contributed financially to the system. So your account has been setup in the “free at point of use” contribution. This was funded by the University, but it comes with some job-size restrictions.

Users that fall within a “contribution” other than F@POU, can run much larger jobs and more of them. We encourage contributions to the system.

Logging in

I’ve forgotten my password, have tried to log in several times and now I can’t even get ssh or PuTTY to connect to the CSF. What should I do?

The CSF uses your central IT password – the same as used for University email, My Manchester, Blackboard and many other systems. Please see the getting started notes for more information.

I’m running GlobalProtect while off campus but can’t log in to the CSF. What should I do?

The CSF does NOT block access from GlobalProtect. To access the CSF from off campus (e.g., at home) you must be running GlobalProtect. However, we may need to add an extra setting to your account. We’ve already added it to what we consider active accounts. But if you’ve not used the CSF for over a year, we may have missed your account. If you are sure that you are signed in to GlobalProtect, please contact us.

Before the University began using GlobalProtect, there may have been a problem with some Internet Service Providers’ DNS and its ability to resolve the CSF address csf3.itservices.manchester.ac.uk to the corresponding IP number(s). Some ISPs cannot do this correctly (because the CSF uses an internal 10.99 IP address for security, which some ISPs do not allow domestic customers to access.) However, this is now unlikely to be an issue once you have signed in to GlobalProtect. But see below to check.

To test if your home ISP will work correctly with GlobalProtect

Sign in to GlobalProtect then run the following command in a terminal window (e.g., the Terminal app on a Mac, a shell window on Linux or a cmd prompt on Windows):

# Run this command in a Terminal window (Mac/Linux) or a CMD prompt window (MS Windows)
nslookup csf3.itservices.manchester.ac.uk
  #
  # If your ISP allows GlobalProtect you should see IP numbers:
  # 10.99.203.... 

If it does NOT report the IP numbers of the CSF login nodes then your ISP cannot resolve the name correctly. It does not matter whether GlobalProtect is running or not (try running the above command with and without GlobalProtect – you will get the same result).

What to do if your ISP has this problem

If you are having this problem you could try to change the DNS server used by your WiFi router. Using the public Google DNS server 8.8.8.8 will fix the problem in most cases. However we cannot provide any help doing this because of the large number of WiFi routers available – please consult the documentation that came with your router.

Alternatively, we can set you up on our ssh gateway which provides an alternative method of getting on campus, which does not have the above DNS problem. It still requires you to be running GlobalProtect though.

It doesn’t seem to accept my password. Why?

If you are using MobaXterm then it will NOT show any characters as you type your password. But MobaXterm is “listening” to what you type. So please type carefully – every keypress is important! Are you certain you have typed your password correctly?

Your keyboard setup may be generating different characters to what you think you are typing. For example the @ key might actually be typing a ” (quote) character. One way to check is to open notepad on your laptop then check that nobody can see your screen. Now type your password in to notepad. This will show the characters that the keyboard is generating. Is your password displaying correctly? Close notepad and DO NOT SAVE what you have typed. That was just a test to check that your keyboard is giving the correct characters.

Many of our users successfully use MobaXterm on Windows PCs. If your password is not being accepted, please double check that the hostname you are connecting to is correct (csf3.itservices.manchester.ac.uk) and your username is correct. Then type your password correctly. In most cases, problems arise because there are spelling mistakes in the hostname, username and password!

I now get a “connection refused” error. Why?

Visiting academics working at UoM – please ensure you are using your UoM credentials if signing in to EduRoam. You will not be able to access the CSF if you use your “home” institution credentials.

All users – if you type your password incorrectly too many times while logging in to the CSF, the IP address you are logging in from will be blocked by the CSF for up to one hour. You’ll see a message similar to:

ssh: connect to host csf3.itservices.manchester.ac.uk port 22: Connection refused

You will have to try again later!

This often happens when people are using MobaXterm. It will NOT show any characters as you type your password. But MobaXterm is “listening” to what you type. So please type carefully – every keypress is important! Are you certain you have typed your password correctly? You may have made too many attempts with an incorrect password.

If your login app (e.g., MobaXterm) offers to save your password, we recommend you DO NOT save your password in the app. If you ever change your UoM password (via https://iam.manchester.ac.uk) then your app will be using an incorrect password.

I’ve been locked out of my account. Why?

Following the recent University-wide password reset, a number of users have been unable to login to the CSF, receiving an “account locked” message.

This is NOT a problem with the CSF. It is caused by your central IT user account being locked.

A common cause is that a mobile app or laptop/desktop app is still using your old password. For example, if you didn’t log out of an email app on your phone before changing your password, it could still be using your old password. You’ll need to sign out of the app and sign in again.

If you saved your password in an app such as MobaXterm, VSCode or WinSCP, then it may still be trying to login with your old password. You’ll need to remove the password from these apps (we recommend that you don’t save passwords in apps.)

You will need to contact the IT Help Desk to have your IT account unlocked – Research IT CANNOT do this for you.

See also the previous answer.

File transfers are not working in MobaXterm. How can I fix this?

File transfers appear to freeze, staying at [0/0] complete, when using the MobaXterm file browser in at least MobaXterm v23.6 and v24.0.

This occurs when specifying your username in the CSF3 session config – which we ask people to create when first setting up MobaXterm. If you’ve been using MobaXterm for a while, but have recently upgraded, you will likely have your username in the session config because that used to work without problems.

The solution is to edit the CSF3 session config and remove your username (leaving the field blank.) You can do this by UNticking the “Specify username” box. You will then be asked for the username during login – simply type your username at that time.

For a full description with screenshots, please see the Logging in from Windows instructions.

Running Jobs, Jobscripts, Modulefiles

I’m new to the CSF and batch computing – is there a quick tutorial?

Yes, please have a go at our 10 minute tutorial on running a simple job in the batch system.

There is also an Intro to CSF training course run at various times in the year which you may wish to attend (new users and those that wish to refresh their memory of the CSF are welcome).

Can I quickly run my code or application on the login node – it shouldn’t take long so writing a jobscript seems like a lot of effort?

The general answer is no, please do not run on the login node.

Here’s a quick one-liner you can use on the login node to run your code in the batch system without writing a jobscript:

qsub -b y -cwd -V ./my_code.exe optional-args

where optional-args are any flags you normally pass to your application.

As you can see, running in batch is very simple – just a few extra words on the command-line. To help understand what is being done here:

  • The -b y flag tells qsub that the file at the end is a binary (executable) rather than the more usual jobscript file.
  • The -cwd runs your job in the current directory – your app will probably read files from and write files to this directory.
  • The -V (uppercase V) causes the job to inherit the current environment when it runs. This includes any settings made by modulefiles and any other environment variables that are set. The environment is copied immediately (when you run qsub, not when the job runs) so you can even log out of the CSF and the job will still see the current environment settings.
  • The ./my_app.exe runs your program assuming it is located in the current directory. If it is a system-wide application (installed by the sys admins) then you should load the application’s modulefile and then use simply the_app.exe (instead of ./the_app.exe), but obviously using the correct name for the application.

There’s a lot more you can do with the batch system so please read through our comprehensive documentation.

My job appears to be stuck in the queue and is not running. Why?

Sometimes when you type qstat you will see your job says Eqw against it. This indicates there is an error with the jobscript. There are four common causes of this issue.

However, if your job is simply sat waiting in the queue (the status is qw) then it is usually because the system is busy. Job scheduling is complicated – there are many rules that determine when your job runs (for example, how much of a share in the system your group has, how much work you have already run this month, how much work other people from your group are currently running – some limits apply to the sum of all people in your group for certain groups).

The configuration of the CSF batch system is very complex. Not all jobs use the same nodes and some jobs can only run on certain nodes. Thus while there may be a lot of jobs queued in the system many of them are waiting for different resources and thus your job is not necessarily waiting behind all the others that have been submitted before it. Some parts of the CSF adjust the number of nodes available as demand changes.

It is almost impossible to say how long a job will be waiting, but given that a job can run for up to 7 days, and there are hundreds of people using the system, a wait of up to 24 hours is actually quite reasonable. It should be noted that if you submit a number of jobs not all of them will start within 24 hours and that if you already have jobs running (e.g. submitted earlier/on a diffiferent day) then newly submitted jobs may have to wait until others you have running complete. We do try to ensure that all users who have submitted work have something running. The only way to guarantee short queue times is to reduce the maximum job runtime and most users prefer the longer job runtimes.

Please note that when the system is busy it can take longer then 24 hours for large jobs (72+ cores to start).

Specifying a particular node type (e.g. -l haswell) restricts where on the CSF your job can run and thus you may wait longer for that node type to become available. If you request 29-32 cores in smp.pe then your job can only use skylake (i.e. it is restricted as if you specified an architecture). Note: some software requires you to be specific – check the appropriate software documentation before submitting your work.

Are you asking for higher memory nodes (mem512, mem1500 or mem2000)? If so, there are not many of these nodes available and on occasion they can get busy and which results in an increased wait time for them. We also have strict limits on the number of cores a user can run on them, this applies to all users regardless of your group share, but we do try to ensure that the limits are flexible in line with demand on them. More detailed advice about queue times on the high memory nodes.

“Free at the point of use” users are restricted to a maximum of 32 cores in use at any one time (providing resources are available, note that priority for resources goes to contributing groups). Therefore if you have a 12 core job running and a 22 core job waiting the 12 core job will need to finish before the 22 core can start. The limit applies across the whole cluster, so if you have serial jobs running then parallel jobs may have to wait as well, for example if 2 serial jobs are running a 32 core job will queue until both the serial jobs finish. To find out if you are in this category enter the command groups on the CSF login node and if it says fatpou01 then you are a ‘Free at the point of use’ user.

Some users make the mistake of not submitting jobs because they think the CSF looks busy. The CSF is always busy (we don’t want it sat around idle) so the best strategy is to submit your jobs as soon as you can. The CSF can’t schedule your jobs if you don’t have them in the queue! Deleting your job and submitting at another time is also a bad idea as the job priority increases the longer it waits and if you submit a new job it will have a low priority.

We check the balance of and demands on the batch system regularly to try and ensure that wait times are not rising significantly. If you have no jobs running and the jobs you have submitted have waited longer than 24 hours then please email its-ri-team@manchester.ac.uk and we will check if there is an issue with your jobs.

My job can’t read my input files. Why?

If your input files are in the directory from where you are issuing the qsub command then you must ensure your jobscript contains the line:

#$ -cwd              

or your qsub command line has the flag, e.g. qsub -cwd. When the job runs it will run in the current directory. Without the cwd flag the job will run in your home directory even if you issue qsub from another directory. The xxxx.oNNNNN and .eNNNNN files will be created in the directory from where the job runs. Hence if the .o and .e files appear in your home directory unexpectedly then you’ve forgotten the cwd option.

Should I load modulefiles first then submit a job or load the modulefiles in the jobscript?

Both are valid – there are advantages to both.

If loading the modulefiles first (on the login node), submit the jobscript containing the option:

#$ -V          # Inherit the current environment (e.g., modulefile settings)

The advantage of this method is that it is easy to make mistakes loading modulefiles (e.g., misspelling the names, ensuring all required modulefiles are loaded in the correct order). By making these mistakes (and correcting them) on the login node before submitting the job, you are certain that your environment is correct when the job runs.

If you try to load modulefiles from a jobscript you have no way of knowing if everything is correct until the job runs. If you’ve made a mistake and the job fails you’ll need to correct the jobscript and resubmit it to the queue. So you’ll have to wait all over again for your job to be scheduled by the system!

Note that if you use #$ -V in your jobscript then the batch system copies your environment settings immediately, when you run qsub, not when the job runs. This means that after submitting the job you are free to change your environment (load other modules) or even log out of the CSF and your job will still see the environment settings you set up when you submitted the job.

However the advantage of loading the modulefiles from the jobscript is that you have a complete record of which modulefiles were used for the job. If you need to rerun the job (in six months time for example) you may not be able to remember which modulefiles were used if you loaded them on the login node before submitting the job.

To load a modulefile in your jobscript do the following:

  1. Ensure the first line of the jobscript is #!/bin/bash --login (or -l for short)
  2. Remove the #$ -V flag from your jobscript
  3. Add the module load some/module/to/load/1.2.3 you would normally run on the command-line.

Remember, if your jobscript fails to set up your environment correctly your job will probably fail. You’ll need to edit your jobscript and resubmit it. So you can actually practice loading the modulefiles on the login node first. By removing the #$ -V line from your jobscript the job will ignore any settings you have made on the login node.

I get /bin/sh and module errors in my batch jobs, what should I do?

If you have loaded the modulefiles required for your job on the login node before submitting then errors such as:

/bin/sh: module: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `module'

will not affect the running or results of the job. They may be an indication that you have specified modulefiles you frequently use in your .bashrc or .bash_profile files and the batch system/compute is unable, and most likely does not need to be able, to process them. We recommend that you use your .modules environment file instead, but it may not totally eliminate the errors/warnings in jobs.

If you are trying to load modulefiles in the job script then the error may indicate that you have not done it correctly. We have advice about this on our modules page.

What is the maximum runtime for a job?

7 days is the default maximum time a job can run for before the system kills it. However, some parallel environments (PEs) have shorter runtime limits. See the PE table for details.

Can I get an email when a job starts, finishes or aborts due to an error?

Yes! – Use the #$ -m options flag in your jobscript as described in the batch script options page.

Can I submit another job from within a jobscript?

No, this is not possible. You can however use a job dependency to make one job wait for another job to finish. You must run qsub twice to submit two separate jobs but with the job dependency flags the second job will not run until the first has completed.

Can I make one job wait for a previous job to finish?

Yes, you can use a job dependency to make one job wait for another job to finish. You must run qsub twice to submit two separate jobs but with the job dependency flags the second job will not run until the first has completed.

I’m getting a display error. I don’t know how to fix it.

This can occur when using tools such as gedit. Make sure that you have connected to the CSF with X11 enabled – see the GUI-based work documentation.

If you are sure that you have the above aspect correct, but you are getting an error like this:

(gedit:28355): Gtk-WARNING **: cannot open display: localhost:12.0

then the most likely cause is a problem with a small hidden file (.Xauthority) in your CSF home directory related to X windows. You should delete the file:

rm .Xauthority

(note the dot at the start of the filename is important), log out of the CSF and back in again. When you log in a new .Xauthority file will be created and you should now be able to use gedit.

Why can I not use watch to monitor qstat?

The watch cmd command repeatedly runs another command cmd for you, every 2 seconds by default. Repeatedly running qstat stresses the batch system so we ask that users do not do this. You may manually run qstat to check the status of your jobs. But there is very little value in doing so repeatedly. If you want to be informed when your job starts and/or finishes, you can ask the system to automatically send you an email – see the section on Batch script options which lists the extra flags you can add to a jobscript to get automatic emails from the job.

We have recently disabled the use of the watch qstat command.

My job has almost reached the runtime limit, please can you extend the running time on it?

Unfortunately, we cannot extend the time limit for running, or queued, jobs. If your jobs needs more than 7 days you will need to try one of the following options:

  • Can your job use more cores? If so will it complete faster?
  • Can you split your job into smaller chunks and run each of those as its own job (each will then get 7 days)?
  • Does your software have checkpoint restart capabilities? This is where every few hours the job writes its state to a file and then the job can be restarted from a known good point using that file.

Note: The runtime limit on GPU nodes is 4 days. The HPC Pool also has a 4 days limit.

My job is running much slower than I expect, why?

We often get asked this question in relation to serial work on the CSF. The individual cores in the CSF may be less powerful (i.e. have a lower GHz) than your PC on which you may have run your work before. However, the CSF has a number of advantages over your PC:

  • You can run more jobs on the CSF at the same time than on your PC.
  • Offloading work to the CSF means your PC will be more responsive to other tasks e.g. email, web browsing, writing a paper.
  • If you switch off your PC any work running on it would be lost. Equally this would happen if your PC suffered a hardware fault. That may also take some time to get fixed. The CSF rarely suffers from such issues and if a compute node running your job does fail you can quickly and easily start the job again on another compute node.

Finally, you may want to investigate whether the software you are using can use more than one core – if it can jobs may be able to run faster.

I get host is not a submit host when running qsub. Why?

An error message similar to:

Unable to run job: denied: host "node403.pri.csf3.alces.network" is not a submit host

means that you are trying to run qsub on a compute node instead of the login node. You are on the compute node because you have an interactive session running (you ran qrsh -l short). But you cannot submit jobs from here. You must be on the login node to submit jobs.

You should exit your interactive session by running exit or log in to the CSF using another shell window (e.g., another MobaXterm window or another Terminal window on MacOS or the nyx virtual desktop). You can log in to the CSF multiple times if you need more than one window on the login node.

I get module: command not found and my job fails. Why?

This happens when the first line of the jobscript does not contain the required --login flag. Your jobscript should begin with the line:

#!/bin/bash --login

if you are going to load modulefiles inside the jobscript (which we do recommend).

My queued jobs have 0.00000 priority, is there something wrong

There are a number of reasons this might be the case.

  1. You are running a job array and have limited the number of tasks (max_aj_instances) using -tc that should run at anyone time. This is very sensible for some jobs, for example on jobs where the I/O is very intensive and potentially disruptive.
  2. You have set a job dependency using -hold_jid.
  3. The cluster is very full. However, even if that is the case then usually if you have no jobs running then some of your queued jobs should start within 24 hours.

My job failed with a “/bin/bash: –login: invalid option” error. Why?

If you have the following error message in your jobscript.eNNNNNNN file:

/bin/bash: --login: invalid option
Usage:      /bin/bash [GNU long option] [option] ...
      /bin/bash [GNU long option] [option] script-file ...
GNU long options:
      --debug
      --debugger
      --dump-po-strings
      --dump-strings
      --help
...

Then the first line of your jobscript is wrong. Please ensure it is exactly:

#!/bin/bash --login

Cut-n-paste the above or type it carefully in your jobscript!

If the first line is correct, please run the following command on the login node to ensure your jobscript text file has “unix” line endings, not “MS Windows” line endings:

dos2unix jobscript

where jobscript is the name of your jobscript file. Please note: you must NOT run dos2unix on any other file – only on your jobscript.

Software and Applications

How do I check if an app / piece of software is installed?

Type (some part of) the name in the Search box above the menu on the left hand side on this page. Alternatively, have a look at the list of applications we have documented. Finally you could also log in to the CSF and run

module search appname

to check for the modulefile.

The software I was using on CSF2 is not installed on CSF3. What should I do?

First check to see if a newer version is installed on the CSF (see question above). We may have upgraded the version on CSF3. If the software has not been installed, please request it via its-ri-team@manchester.ac.uk and we will look in to it. Please note, we do not intend to install old versions of software on the CSF3. When moving to CSF3 please take the opportunity to upgrade your work to using a newer version of the software if possible.

Compiling software

What Does the error forrtl: severe (40): recursive I/O operation, unit -1, file unknown mean when compiling my Fortran code>?

We have seen this with v12 of the Intel compiler when several OpenMP threads attempt to write to standard out concurrently. We have specifically noticed this when linking FORTRAN from C/C++. Ensure that only one thread in your OpenMP code is writing to standard out.

ifort gives me an error: cannot find -lm. How do I fix that?

Please see our Intel Compiler notes for the fix.

Windows Users

I’m used to windows, not linux. How do I access and use the CSF?

Please see the guide for Windows users.

Files and Filesystems

The questions below are just a few of the common ones we get asked. We have a longer and more detailed FAQ on this topic.

How do I download something from an external site or access repos such as github from the CSF?

Updated June 2023: The University proxy was previously used to provide external access (e.g., to access GIT/SVN repos, download data from websites, install python and R packages.) As of June 2023 this is no longer available. Hence the proxy modulefiles will do nothing (except report that they are no longer needed.)

To access the outside world for data downloads, SVN/GIT repo access etc, please use an interactive session on a compute node, or submit a batch job, to run your commands as normal. You DO NOT need to load the proxy modulefiles.

Please note: The login nodes DO NOT have external access. You should use a backend compute node via an interactive session or batch job.

Please also see our git and github.com advice.

I’ve deleted a file. Can you get it back for me?

Now that all home directories (and other research-group-owned data areas) are on the Isilon central storage you can recover the files yourself. See the filesystems page for full details.

Scratch is not backed up so files cannot be recovered from there. This is one reason why you should not use scratch for long-term storage – it is for temporary storage only while the job is running.

I’ve got 1000s of files in scratch I want to download. What’s the best way?

Downloading a large number of individual files can be very time consuming and will place a lot of strain on the login node.

First consider whether you need to download the files at all. If they are important result files you should consider keeping them in your home area which is on secure backed-up Isilon storage. Your research group may also have additional Isilon areas for specific research projects or data areas. Downloading to a PC that isn’t backed up could result in data loss if the local PCs disks fail. If you don’t have enough space in your Isilon area then consider compressing the files (with zip or gzip).

If you still want to download a copy then a better option would be to zip up the files in to a single compressed archive. Zip files are common on Windows / MacOS so if you want to transfer the files to a local Windows / MacOS computer you can create the zip file on the CSF and then download it. Alternatively if your local PC is running linux you can create a tar.gz file on the CSF and download that. We advise running the zip app as a batch job to prevent the login node from being overloaded. Here’s how:

# In this example we assume the files to be downloaded are in the folder:
~/scratch/my_data/experiment1/

# Go to the parent of the required location in scratch. For example:
cd ~/scratch/my_data/

# zip up all the files from a sub-directory named 'experiment1'
qsub -b y -cwd -m e -M $USER zip -r my_stuff.zip experiment1

# Or, to create a .tar.gz file for use on Linux PCs/Laptops:
qsub -b y -cwd -m e -M $USER tar czf my_stuff.tar.gz experiment1

The above command will submit a batch job (without writing a jobscript), run it from the current directory in the short environment (and will email you when it has finished. The job will zip up and compress all the files in the experiment1 sub-directory of the scratch/my_data/ directory (change the names to suit your own directory structure). When the job finished you’ll have a file named my_stuff.zip in your ~/scratch/my_data/ directory which you can then download using WinSCP, scp or other favourite file transfer program from your PC. Alternatively copy the zip file to your home area.

How can I free up some space in my home or scratch area?

The obvious answer is to delete unwanted files (use the rm command or your preferred graphical file browser such as that in MobaXterm). However, deleting results and data files is not always possible. But there are ways to reduce your usage:

  1. Compress your files. Many applications write out plain text results files and other log files. These can be huge. Do you need the log file? If not, delete it. But the results files will compress will using gzip myresult.dat (which will create a new smaller file named myresult.dat.gz. You can still read the file using zless myresult.dat.gz or uncompress it using gunzip myresult.dat.gz
  2. Delete unwanted job .oNNNNNN and .eNNNNNN output files. Every job will produce an output file capturing what would have been printed to screen when your application ran. The files can contain normal output (the .o file) and error messages (the .e file). Each file will have the unique job number at the end of the name. If you run a lot of jobs (1000s – and many users do!) you will soon have 1000s of files. We’ve seen some directories with millions of these output files! Individually each file is often small but they soon accumulate. They also take up more space on the filesystem than you think (the minimum block size of the storage system is used even if your file is smaller). Please delete unwanted job output files. The following command can be used:
    rm -f *.[oe][0-9]* 
    
  3. Keep your job directories tidy. Deleting files from jobs you ran months ago is never an exciting task – you may have 1000s of output files. Nobody likes looking through old files to see if you need them or not. Deleting unwanted files when the job finishes is the best way to keep your storage areas tidy. You can even but the delete commands (rm in your jobscript to clean up any junk at the end of a job.

Some of my scratch files have been deleted! Where have they gone?

It could be the automatic scratch clean-up policy that has deleted your files:

Please note: the scratch filesystem automatic clean-up policy is active. If you have scratch files unused for 3-months or more they may be deleted. Please read the Scratch Cleanup Policy for more information.

I have downloaded a .zip of several datasets but the scratch clean-up keeps deleting them. What can I do?

When you unzip (or tar) the archive of datasets, the files in the archive will be created with their original timestamps. These could be months or years in the past. The scratch-tidy will then see these old files and delete them.

The solution is to ask unzip (or tar) to extract the files and apply today’s date to them. See here for the extra flag/switch you must add to unzip or tar -xf to extract the files correctly.

How do I find out my current scratch usage (space and number of files?)

Run the following command on the login node to find out:

scrusage

Example output

Your scratch usage: 450.7GB, 84,053 files

File transfers are not working in MobaXterm. How can I fix this?

Please see the answer to Q1.5.

Requesting Further Help

I need some help, who do I contact?

In the first instance please try searching this site – we have lots of documentation covering Jobs, Storage and specific Application Software. The search box in the top left-hand corner will search all pages on this site. If that still doesn’t answer you query, please email its-ri-team@manchester.ac.uk providing as much detail as possible – please include jobid number, the folder where your job is running, any error messages produced by your job, where you are logging in from (home, on-campus?) and the type of OS you are using (Windows, MacOS, …?). We will create a ticket for you in the Support Centre systems and send further messages / questions via that system – look our for the emails related to your ticket.

How can I check on current system status, how any maintenance is progressing or check if a problem has been noticed?

Please see our Services News page. We will post updates about any on-going maintenance, any major problems with the system and also problems with other IT Services platforms that might affect your access to the CSF (e.g., network problems.)

I have a Support Centre ticket but it’s been closed! Why?

You may receive an email saying that the ticket has been closed with “Reason: Cancelled by user via Self Service portal.” This means that the ticket was being progressed but we had sent you a request for more information about something. If you fail to respond within about two weeks the ticket is automatically closed by the ticket system (not by us!) If you believe further work still needs to take place, please email its-ri-team@manchester.ac.uk with the original ticket number and the information requested from you in the original ticket.

Last modified on October 23, 2024 at 1:14 pm by Pen Richardson