Contributor FAQs

Questions answered

Below is a list of common questions regarding the Computational Shared Facility (CSF) which are typically asked by those who wish to join the service. Please get in touch if your query is not answered here, or if you would like to arrange a meeting to discuss your requirements.

Email the Research Infrastructure team: its-ri-team@manchester.ac.uk

  1. What is the CSF?
  2. What is the funding model?
  3. What is the Free At Point of Use access?
  4. Who manages it?
  5. Who uses it?
  6. What can I buy?
  7. How much does it cost?
  8. Are there any extra costs or hidden charges?
  9. What are the limits on the life of hardware?
  10. What filestore is available?
  11. How do I make a contribution?
  12. Can I try before I buy?
  13. What happens to my contribution? CSF “Shares” and GPU limits.
  14. What happens to my shares when my contribution is 5 years old?
  15. What are the benefits of this over having my own system?
  16. I have some hardware already, can I add it to the CSF?
  17. Are there any restrictions on what I can run? What if I only contributed CPU or only GPU nodes?
  18. Who runs the service?
  19. I’ve never used a system like this before, where do I start?
  20. Does it run Microsoft windows?
  21. I have a visitor to my research group, can they use the CSF?
  22. I have some funds, but they need to be spent very soon. Can I contribute them to the CSF?
  23. I have provided an account code, but it has not been charged yet. Is there an issue?

What is the CSF?

The Computational Shared Facility is a collection of compute servers joined together to provide a large resource which allows researchers to run computationally intensive calculations, simulations, and problem analyses. Such systems are commonly known as High Performance or High Throughput Computing (HPC/HTC) clusters.

The primary role of the system is to run batch (i.e., non-interactive) jobs in the traditional ‘HPC’ way. A limited number of jobs may also be run interactively, for instance, by using an application’s graphical interface, but we usually recommend a different platform, the iCSF (aka Incline). for such work.

What is the Funding Model?

The cluster is built on a shared model which is expandable using research funding: The University has provided the initial investment in the infrastructure required to build and house the system. “Contributors” (which can be research groups, schools/departments, faculties or individual PIs) supply funds which buy the compute-node servers (most HPC platforms are a combination of infrastructure and “compute nodes”.) This means contributor funds are used purely to increase the compute capacity and capability of the system. Periodic refreshes of infrastructure (storage, networking, management h/w and s/w, login nodes, data-centre) are funded by the University.

All compute nodes are pooled in to the HPC platform – there is no concept of “your nodes.” The batch system will run your jobs on any available, appropriate hardware. See What happens to my contribution? CSF “Shares” and GPU limits for more info.

What is the Free At Point of Use access?

The University has also invested in some compute hardware to enable some very limited free-at-point-of-use access for non-contributing groups (job size and throughput is very limited). By contributing, a research group can run much larger jobs and has more throughput of jobs. Only members of groups that have contributed may use a group’s share of the system.

The ‘free at the point of use access’ is ideal for trying out the system and running small amounts of work. However, the limits on job-size and throughput of work mean that contributing research groups have much greater capacity in the system.

Who manages it?

The Research Infrastructure Team within IT Services looks after all aspects of the cluster including the operating system (Linux), hardware, batch system, user management, filestore, much of the application software, and first and second line support. IT Services staff costs are currently funded by the University, not as part of any contribution.

Who uses it?

Traditionally the main users of compute clusters are from within the engineering and physical sciences. However, users from the medical/health and life sciences are also big users of the CSF. This is a very fast developing and growing domain of use. We are also helping researchers from the Faculty of Humanities, including the Alliance Manchester Business School, to discover the benefits of access to high-end computational resource.

Potential users from any department on campus are welcome to contact us, using the above email address, to discuss whether the CSF may be of benefit to their area of research and what their requirements might be.

What can I buy?

The Research Infrastructure team determine the default compute node specification, to ensure that the system does not become too heterogeneous and in turn difficult to use and manage. Contributions will then be used to increase the number of these nodes in the system.

If you have particular requirements for large-memory nodes or GPU nodes, we also have specifications for such nodes. These will require a larger contribution to cover the cost of these nodes.

We are currently able to offer guidance and pricing (quotes) on the following compute-node contributions:

  • Intel CPU-based nodes with standard or high memory
  • GPUs-based nodes (Nvidia A100s)

We keep abreast of developments in CPU and GPU technology to try and determine when and if it is practical to consider any other architectures.

How much does it cost?

The cost depends upon:

  • Current supplier hardware prices.
  • What the final specification of your compute contribution is, which will be based upon a detailed discussion of your requirements
  • As a guide, large amounts of memory increase cost significantly and GPUs are expensive.
  • The number of compute servers you add increases the cost.
  • Application needs — most of the software on the CSF is open-source and does not incur any cost to you. Only a limited number of commercial software packages are centrally licensed and paid for by the University. The Research Infrastructure team will not purchase application licenses as part of the CSF procurement (unless they are need as part of the infrastructure – currently none are needed.) But you may need to factor the cost of commercial application software licenses for your users in to your costs.

We can provide you with an update-to-date price which will include all hosting, setup, sysadmin, home filesystem storage and VAT.

This is a one-off cost – we run your hardware in the CSF for 5 years so you get 5 years “contributor” access. See below for what happens after 5 years.

Please email its-ri-team@manchester.ac.uk to discuss CSF contributions.

Are there any extra costs or hidden charges?

The price provided to you will include hardware, installation, configuration, standard support, a commitment to run the hardware for 5-years in the system, and VAT.

In-depth support, for example complex programming and debugging assistance, or help optimising codes and applications, is readily available, though a cost may be associated with lengthy support requirements.

We can also assist in technical input for grant applications, should that be required.

What are the limits on the life of hardware?

The University is currently paying the power and cooling costs of the cluster and has made it a requirement that hardware be decommissioned when it is deemed to be too inefficient compared to new hardware. Each contribution is therefore run for 5 years.

Research Groups are encouraged to contribute funds to add compute nodes once they have joined or refresh their contribution no later than at the 5 year point. If a group has no contribution beyond the 5 year switch off their access will continue for a further two years at a much-reduced priority/share before access is reduced to ‘free at the point of use’ with the strict limits that has.

What filestore is available?

Each contribution includes 500GB of highly resilient home space for the users from the research group. In addition, Research Data Storage shares can be mounted on the facility (provided they are NFS not CIFs).

All users are entitled to use the large scratch filesystem, but no specific allocation of space is given to a group. Scratch is intended for temporary files and running jobs. Scratch is not backed up. Files unsued for three-months or more may be deleted without notice.

View further information about filestore within the CSF User Documentation.

How do I make a contribution? What cost should I add to my grant?

Simply get in touch with the Research Infrastructure team via its-ri-team@manchester.ac.uk to discuss your requirements. We will then help you estimate what resources you will require and cost it up for you.

Please note that there is usually a minimum buy in of 2 compute nodes for new groups (typically those who require more than ‘free at the point of use’ throughput).

Once funds are committed we can usually grant some access prior to the installation of the contribution.

Research IT usually organise two or three procurements a year, each of which uses funds from multiple research groups. The procurement is made by IT Services and we re-charge groups’ accounts via account codes (which must be provided in advance).

Can I try before I buy?

Yes, via the ‘free at the point of use’ option. If you have a need to evaluate on more than that provides we are happy to discuss options.

What happens to my contribution? CSF “Shares” and GPU limits

CPU and GPU compute nodes are not directly associated with a research group – there is no concept of your nodes. Each contribution is added to the cluster and pooled with all other contributions.

For CPU additions each contributing group is then allocated a share in the batch system, proportional to their financial contribution, which ensures that they get access to the amount of resource (throughput) they contributed, averaged over a period of time (a month).

GPU contributions operate differently to CPUs. Rather than a share in the system we place limits on your group in line with the number of GPUs contributed and we will discuss these with you when helping to cost up your GPU requirements.

All work is run via the batch system – you don’t have “your own” compute nodes sat idle – so might not see your jobs start immediately (although wait times are not usually very long.) Jobs submitted to the system can only start when the resources needed to run those jobs become free (when other jobs finish). Immediate access would require compute nodes to be kept idle until you wish to run jobs (which is obviously a very bad idea – we want the system to be fully utilised.) Given that we have a 7-day runtime limit (which is actually quite long for compute clusters) a wait of up to 24 hours is considered acceptable, although wait times are typically much shorter.

The batch system also assigns a priority to each job. If a member of your group has not run much work this month they automatically get a higher priority so that some of their jobs start sooner than jobs of users who have run a lot of work in the month.

The batch system will automatically manage the throughput of your jobs – the larger your share the more throughput your group will have, but you must submit enough jobs to the system to take advantage of this. Groups that need to run a lot of compute typically contribute more funds to achieve greater throughput, increasing the available hardware accordingly.

The pooled model also permits groups with large amounts of compute to mop up spare (unused) CPU cycles from groups that have not submitted much work – we don’t want compute nodes sat idle so if a group has a high workload they will get more resource than they contributed if that resource would otherwise be idle.

What happens to my shares when my contribution is 5 years old?

Once the shares reach 5 years of age they will be halved for 2 years. After that two years they will be removed and if you have not been topping up your contribution then users will be moved the the ‘free at the point of use’ access. For example, if you added 32 shares in 2017 they will be halved to 16 in 2022, and then in 2024 they will be removed. We aim to do this adjustment of shares annually in Sept/Oct each year.

GPU limits will be reviewed are in a similar manner.

What are the benefits of this over having my own system?

If you were to buy your own cluster you would need:

  • Associated infrastructure such as racks, fileservers, login nodes. Hence a contribution to the CSF gets you more compute for your money (the infrastructure already exists in the CSF and is funded by the University.)
  • Someone to look after it which could use up valuable research time or who may leave and forget to tell you how it is configured. PhD students should NOT be spending their time on this!
  • Experience or knowledge of looking after a cluster (particularly now that security is so important – is your research data safe?)
  • Somewhere to house it.
  • To (potentially) pay your own power and cooling costs.

If a compute node in the CSF fails:

  • You are unlikely to see any impact to your daily work as IT Services will fix it in the background
  • The scale of the cluster means that the natural slack available on other compute nodes absorbs the extra work

By pooling your resource in the CSF you can potentially use a little more resource than you contributed – not every group uses it flat out all of the time, some work in spates of activity. The batch system reallocate this spare capacity to run whatever work is waiting to be run.

I have some hardware already; can I add it to the CSF?

This is highly unlikely. We have planned the system on certain hardware, power and cooling parameters to ensure that it is manageable, expandable and sustainable.

Are there any restrictions on what I can run? What if I only contributed CPU or only GPU nodes?

The CSF is a general purpose computational cluster. It is capable of running jobs which require 1 to a few hundred cores, low and high memory jobs, your own code, open-source or proprietary software. As far as possible we pool the nodes so that they can be used by everyone.

We have some limits in place to control the amount of resource that can be used by any individuals at any one time and also to ensure that a mixture of jobs are able to run at the same time. These limits have not been found to have a detrimental impact on throughput or a group’s share of the system.

If you need large amounts of memory or lots of cores then you will be expected to have contributed enough to meet those needs.

Some of the very high memory nodes and the GPU nodes are designed for certain workloads. We have fewer of these types of node than standard CPU nodes, largely because they are expensive. Therefore we restrict access to these nodes to people who have work that can benefit most from them, and priority is given to those who have purchased that type of node.

If you add CPU only then your resources will be allocated as CPU. You can however use the GPUs under the ‘free at the point of use’ allocation. And vice versa, a GPU only contribution results in ‘free at the point of use’ CPU access. You cannot buy lots of CPU resources and get GPU resources in return (other than the limited free-at-point-of-use option available to everyone.) You must purchase the hardware that is most appropriate to your needs. If you have purchased a mixture then you will be allocated resources accordingly.

Software must be either open source or purchased and licensed in line with University procurement procedures. Software licenses need to be hosted on the official license farm provided by IT Services.

Who runs the service?

Research IT within IT Services. The CSF has been designed and configured based upon extensive consultation with the user community. Our approach is very customer focused so please do ask questions, give feedback, make suggestions or discuss any special requirements you have. Our aim is to help you wherever possible.

Changes to the system are notified to users as appropriate and major changes in policy are discussed with users in advance.

I’ve never used a system like this before, where do I start?

Feel free to browse the following:

We are also happy to visit research groups and provide an overview on how to use the service and answer questions. To request a visit please contact the Research Infrastructure team.

To help users who are more familiar with a graphical user interface (GUI) than a command line, we are developing:

  1. A web based portal which will give access to some aspects of the cluster; for example, filestore and basic batch job submission
  2. A virtual desktop service

Does it run Microsoft windows?

No. Compute clusters are traditionally Unix/Linux based. It is though possible to connect to the system via a MS Windows desktop/laptop.

Many applications can run on MS Windows or Linux, including Matlab, Abaqus, and R. We will help you figure out whether the software you need can be run on Linux.

I have a visitor to my research group, can they use the CSF?

Yes, please see the User Accounts page for details of how to register all your users.

I have some funds, but they need to be spent very soon. Can I contribute them to the CSF?

It really depends what you mean by ‘soon’. We are not able to charge your account code and pay for the hardware until it is on site. Lead times on hardware supply vary so we cannot always predict how long it will take for the hardware to arrive.

We try to do two procurements a year – one in the late autumn/early winter and one in late winter/early spring. It is strongly advised that you join one of these procurements if you have funds as we are normally not able to make small one off purchases and the costs are much higher if we do.

I have provided an account code, but it has not been charged yet. Is there an issue?

Usually this is because we are not able to charge your account code and pay for the hardware until it is physically on-site (delivered to the University data-centre.) Lead times on hardware supply vary so we cannot always predict how long it will take for the hardware to arrive.

Last modified on November 23, 2023 at 1:51 pm by George Leaver