The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
Contributor FAQ
Questions answered
Below is a list of common questions regarding the Computational Shared Facility (CSF) which are typically asked by those who wish to join the service. Please get in touch if your query is not answered here, or if you would like to arrange a meeting to discuss your requirements.
Email the Research Infrastructure team: its-ri-team@manchester.ac.uk
- What is the CSF?
- Who uses it?
- What can I buy?
- How much does it cost?
- Are there any extra costs or hidden charges?
- What are the limits on the life of hardware?
- What filestore is available?
- How do I make a contribution?
- Can I try before I buy?
- What happens to my contribution?
- What are the benefits of this over having my own system?
- I have some hardware already, can I add it to the CSF?
- Are there any restrictions on what I can run?
- Who runs the service?
- I’ve never used a system like this before, where do I start?
- Does it run Microsoft windows?
- I have a visitor to my research group, can they use the CSF?
What is the CSF?
The Computational Shared Facility is a collection of compute servers joined together to provide a large resource which allows researchers to run computationally intensive calculations, simulations, and problem analyses. Such systems are commonly known as High Performance or High Throughput Computing (HPC/HTC) clusters.
The cluster is built on a shared model which is expandable using research funding. The University has provided the initial investment in the infrastructure required to build and house the system and research groups add compute servers. There is some very limited free-at-point-of-use access for non-contributing groups (job size and throughput is very limited). By contributing, a research group can run much larger jobs and has more throughput of jobs. Only members of groups that have contributed may use a group’s share of the system.
The Research Infrastructure Team within IT Services looks after all aspects of the cluster including the operating system (Linux), hardware, batch system, user management, filestore, much of the application software, and first and second line support. IT Services staff costs are funded by the University, not as part of any contribution.
The primary role of the system is to run batch (i.e., non-interactive) jobs in the traditional ‘HPC’ way. A limited number of jobs may also be run interactively, for instance, by using an application’s graphical interface, but we usually recommend a different platform, the iCSF (aka Incline). for such work.
Who uses it?
Traditionally the main users of compute clusters are from within the engineering and physical sciences. However, users from the medical/health and life sciences are also big users of the CSF (and DPSF) and the is a very fast developing and growing domain of use. We are also helping researchers from the Faculty of Humanities, including Alliance Manchester Business School, discover the benefits of access to high-end computational resource.
You can view a sample of our current users to see the various disciplines that are increasing their research on the CSF.
Potential users from any department on campus are welcome to contact us, via the above email address, to discuss whether the CSF may be of benefit to their area of research and what their requirements might be.
What can I buy?
We have some limitations on what can be added as we need to ensure that the system does not become too heterogeneous and in turn difficult to use and manage. However, we are able to offer a range of options including:
- Intel CPU-based nodes with standard or high memory
- GPUs-based nodes
How much does it cost?
The cost depends upon:
- What the final specification of your compute contribution is, which will be based upon a detailed discussion of your requirements
- As a guide, large amounts of memory increase cost significantly
- The number of compute servers you add
- Application needs — only a limited number of software packages are licensed and paid for by the University. Depending on your needs you may therefore need to pay to purchase commercial software. Open-source software does not involve any extra cost
We can provide you with an update-to-date price which will include all hosting, setup, sysadmin, home filesystem storage and VAT. Please email its-ri-team@manchester.ac.uk for more information.
Are there any extra costs or hidden charges?
The price provided to you will include hardware, installation, configuration, standard support and VAT.
In-depth support, for example complex programming and debugging assistance, or help optimising codes and applications, is readily available, though a cost may be associated with lengthy support requirements.
We can also assist in technical input for grant applications, should that be required.
What are the limits on the life of hardware?
The University is currently paying the power and cooling costs of the cluster and has made it a requirement that hardware be decommissioned when it is deemed to be too innefficient compared to new hardware. Each contribution is therefore run for 5 years.
It is envisaged that research groups will continue to add compute once they have joined or refresh their contribution at the 5 year point. If a group has no contribution beyond the 5 year switch off their access will continue for a further two years at a much-reduced priority/share before access must be terminated.
What filestore is available?
Each contribution includes 250GB of highly resilient home space for the users from the research group. In addition, Research Data Storage shares can be mounted on the facility (provided they are NFS not CIFs).
All users are entitled to use the large scratch filesystem, but no specific allocation of space is given to a group. Scratch is intended for temporary files and running jobs. Scratch is not backed up. Files over three-months old may be deleted without notice.
View further information about filestore within the CSF User Documentation.
How do I make a contribution?
Simply get in touch with the Research Infrastructure team via its-ri-team@manchester.ac.uk to discuss your requirements. Once funds are committed we can usually grant some access prior to the installation of the contribution.
Research IT usually organise three or four procurements a year, each of which uses funds from multiple research groups. The procurement is made by IT Services and we re-charge groups’ accounts via account codes (which must be provided in advance).
Can I try before I buy?
It is usually possible to offer evaluation access. Please get in touch and if appropriate we’ll discuss this as part of looking at your requirements.
What happens to my contribution? CSF “Shares”
Compute nodes are not directly associated with a research group – there is no concept of your nodes. Each contribution is added to the cluster and pooled with all other contributions.
Each contributing group is then allocated a share in the batch system, proportional to their contribution, which ensures that they get access to the amount of resource they contributed, averaged over a period of time (a month).
Groups do not get immediate access to the share they have contributed at any given time – i.e., jobs submitted to the system can only start when the resources needed to run those jobs become free (when other jobs finish). Immediate access would require compute nodes to be kept idle until you wish to run jobs (which is obviously a very bad idea – we want the system to be fully utilised.) Jobs are starting and finishing all the time on the CSF so you rarely wait long for a job to start. Given that we have a 7-day runtime limit (which is actually quite long for compute clusters) a wait of up to 24 hours is considered acceptable, although wait times are typically much shorter. The batch system also assigns a priority to each job. If a member of your group has not run much work this month they automatically get a higher priority so that some of their jobs start sooner than jobs of users who have run a lot of work in the month.
The batch system will automatically manage the throughput of your jobs – the larger your share the more throughput your group will have but you must submit enough jobs to the system to take advantage of this. Groups that have a lot of compute to run typically contribute more funds to achieve more throughput, increasing the available hardware accordingly. The pooled model also permits groups with large amounts of compute to mop up spare (unused) CPU cycles from groups that have not submitted much work – we don’t want compute nodes sat idle so if a group has a high workload they will get more resource than they contributed if that resource would otherwise be idle.
In the event of problems with access to resource, or if you need a short period of exclusive access to meet an urgent deadline, we can discuss options with you. Please give as much notice as possible of any special requests.
What are the benefits of this over having my own system?
If you were to buy your own cluster you would need:
- Associated infrastructure such as racks, fileservers, login nodes. Hence a contribution to the CSF gets you more compute for your money
- Someone to look after it which could use up valuable research time or who may leave and forget to tell you how it is configured
- Experience or knowledge of looking after a cluster
- Somewhere to house it
- To (potentially) pay your own power and cooling costs
If a compute node in the CSF fails:
- You are unlikely to see any impact to your daily work as IT Services will fix it in the background
- The scale of the cluster means that the natural slack available on other compute nodes absorbs the extra work
By pooling your resource in the CSF you can potentially use a little more resource than you contributed – not every group uses it flat out all of the time, some work in spates of activity.
I have some hardware already; can I add it to the CSF?
This is highly unlikely. We have planned the system on certain hardware, power and cooling parameters to ensure that it is manageable, expandable and sustainable.
Are there any restrictions on what I can run?
The CSF is a general purpose computational cluster. It is capable of running jobs which require 1 to a few hundred cores, low and high memory jobs, your own code, open-source or proprietary software. If you need large amounts of memory or lots of cores than you will be expected to have contributed enough to meet those needs.
We have some limits in place to control the amount of resource that can be used by any individuals at any one time and also to ensure that a mixture of jobs are able to run at the same time. These limits have not been found to have a detrimental impact on throughput or a group’s share of the system.
Software must be purchased and licensed in line with University procurement procedures. Software licenses need to be hosted on the official license farm provided by IT Services.
Who runs the service?
Research IT within IT Services. The CSF has been designed and configured based upon extensive consultation with the user community. Our approach is very customer focused so please do ask questions, give feedback, make suggestions or discuss any special requirements you have. Our aim is to help you wherever possible.
Changes to the system are notified to users as appropriate and major changes in policy are discussed with users in advance.
I’ve never used a system like this before, where do I start?
Feel free to browse the following:
- Getting Started on the CSF
- Available training courses
- Our support overview information
We are also happy to visit research groups and provide an overview on how to use the service and answer questions. To request a visit please contact the Research Infrastructure team.
- Email the Research Infrastructure team: its-ri-team@manchester.ac.uk
To help users who are more familiar with a graphical user interface (GUI) than a command line, we are developing:
- A web based portal which will give access to some aspects of the cluster; for example, filestore and basic batch job submission
- A virtual desktop service
Does it run Microsoft windows?
No. Compute clusters are traditionally Unix/Linux based. It is though possible to connect to the system via a MS Windows desktop/laptop.
Many applications can run on MS Windows or Linux, including Matlab, Abaqus, and R. We will help you figure out whether the software you need can be run on Linux.
We are considering offering some MS Windows-based batch computational resource via the Condor pool.
I have a visitor to my research group, can they use the CSF?
Yes, please see the User Accounts page for details of how to register all your users.