Research Infrastructure

Our Services – The CIR Ecosystem

University of Manchester academics, postdocs and postgrads have access to a complete campus ecosystem for computationally-intensive research (CIR). The ecosystem comprises the following integrated services which act as a complete computational/storage/VM package. We can provide some free-at-point-of-use access or larger-capacity funded access.

Please contact us via its-ri-team@manchester.ac.uk if you have any questions, would like advice on which aspects of the ecosystem are best suited to your work or to request access to specific systems.

Batch Computation – The CSF
Batch-based, computational resources – the Computational Shared Facility (CSF) is the University flagship HPC cluster with over 14000 CPU cores available. It is used for a wide variety of work: parallel computation using multiple (2 to 100s of) CPU cores; high-throughput work (running lots of copies of a job to process many datasets); work requiring large amounts of memory (RAM) or access to high-capacity (disk) storage with fast I/O; access to Nvidia Volta (v100) GPUs. A dedicated HPC-Pool is available for those wishing to run larger multi-node parallel jobs (up to 1024 cores per job).
Interactive Computation – The iCSF
A computational resource designed for GUI-based interactive work – the iCSF, aka Incline, is designed for research groups that currently purchase powerful workstations for private use. It does not use a batch queuing system — hence the name: interactive-CSF. It is expected that Incline will be used closely with the Research Virtual Desktop Service (see below).
Large-scale Research Data Storage
Large-scale, resilient (backed-up) storage for research data – the Research Data Storage (RDS) service (aka isilon). This is a multi-petabyte storage system, providing storage to be allocated to/shared amongst research groups. Some storage can be provided free-at-the-point-of-use, with the option to buy more for groups who need more.

The RDS service provides storage “shares” (areas of storages) for researchers which may be accessed from desktop machines across campus or from the CSF, iCSF, zCSF and Condor.

High Throughput Computing – Condor
The Research IT Condor service is a computational platform which uses “spare” CPU-cycles from open-access PC clusters located on campus. It is suitable for HTC, i.e., for running large numbers of small and short jobs to process many 100s – 1000s of datasets. We can also burst Condor jobs in the AWS Cloud.
Emerging Technology Cluster – The zCSF (aka Zrek)
Zrek brings together hosts with novel and emerging technologies. The cluster currently comprises our older GPUs (10 Nvidia K20 GPUs and two K40 GPUs – see the CSF above for details of the much newer Nvidia volta V100 GPUs.) It also hosts an ARM-64 compute node. Previously it has hosts Intel Xeon Phis and FPGAs.
Research Virtual Machine Service (RVMS)
Centrally-hosted virtual machines (VMs) are provided for University Research staff and PG students through the Research Virtual Machine Service (RVMS). This service is complementary to commercial cloud services, such as Amazon AWS, Azure and Google Cloud Platform: the RVMS is suitable for VMs which are expected to generate large amounts of network traffic, may be moderately CPU-intensive or require tight integration with the CIR platforms (CSF, iCSF, etc.) or the Research Data Storage (RDS) service.

Administrative control of VMs provided may be handed over to research groups. Alternatively, administrative support may be provided by Research IT where no group administrators are available.

It is hoped that this service will help to reduce the number of insecure “under-desk servers” to be found scattered around campus!

Research Virtual Desktop Service
A research virtual desktop service (RVDS) which provides a virtual Linux desktop from which you can access the CSF and iCSF — and the same Linunx desktop session — from anywhere in the world.
Consultancy
We are also able to provide advice and guidance on bespoke infrastructure requirements for CIR.

Ecosystem Integration

The components of the ecosystem are integrated to make using the systems with your data as easy as possible. Many research groups have access to multiple systems in the ecosystem, for example so that both batch and interactive computation can be used to process datasets, with all systems accessing the same storage.

Components of the CIR ecosystem
Components of the UoM Campus CIR Ecosystem (Click for larger image.)

Common Storage
Each user has the same home-directory (sees the same files) on all computational facilities — RDS-based shared areas (e.g., /mnt/01-home01 on the CSF) are also available everywhere within the ecosystem.
Fast, Dedicated Networking
RDS storage, the CSF, Redqueen and Incline are all linked by a dedicated, fast network. This Research Data Network (RDN) has been extended to the Michael Smith Building to join with FLS computational infrastructure. And a major project to upgrade the campus network speed is well underway, with five buildings to be upgraded from 1 Gb to 10 Gb uplinks in 2014.
Common Off-Campus Access — Virtual Desktop
All ecosystem computational facilities and storage are directly accessible from on campus only (for security reasons). Methods for indirect off-campus access are therefore provided: an SSH gateway to the computational facilities; an SFTP/SSHFS gateway to the RDS shares mounted on the these; and a virtual desktop service, by means of which the same desktop session can be used from office, home and elsewhere.
Integrated Web Services and Other Virtual Machines
Increasingly, computational services are used through a Web-based front end; research output is also made available by the same means. By using the RVMS (see above), researchers are able to set up IT Services-hosted Web sites for both of these purposes.

Ecosystem Workflow — Data, Compute and Visualisation from the Office, Home and Abroad!

By using the ecosystem computational facilities, storage and virtual desktop, it is easy to prepare, submit and monitor jobs, and post-process results, without transferring any data, all from home, office and abroad (e.g. while at a conference).

Example Workflow

CIR Example Workflow (Life Sciences)

CIR Example Workflow in Life Sciences (click for larger image.)

  1. Rusty’s data exits his group’s DNA sequencer straight onto storage provided by the RDS, over fast, dedicated networking infrastructure (the RDN). The data is now visible on the CSF and iCSF.
  2. Accessing the data on the CSF, by using the Research Virtual Desktop Service (RVDS), he then defines a series of computational jobs to process the data and submits them to the batch system.
  3. Later, from home, Rusty re-connects to his RVDS session to monitor his jobs to ensure all is well — or make any necessary tweaks. Over the next few days, from a conference in Barcelona, Rusty checks progress again using the RVDS, from his laptop, and also the SSH Gateway, from his phone; he clears some jobs which have failed and submits additional, corrected work.
  4. Back at the office, batch jobs finished, and using the same RVDS session, Rusty starts GUI-based, interactive post-processing on Incline (aka the iCSF) — no need to move data as all the same RDS-based filesystems are available on all ecosystem compute resources.
  5. Finally, the results are ready and made available to the public via a Web server running on the Research Virtual Machine Service (RVMS) — accessing the same RDS share.

Another similar scenario

ecosystem2-img2

How the Ecosystem components are could be used in a typical CIR workflow. Click for larger image.

Centralisation of Infrastructure

Before 2010, many small “beowulf” HPC clusters existed on campus. While some were well-run by academics and postgrads, others were not; all took time to administer which was better spent on research; most had many “spare” CPU cycles. Since that time, most campus beowulfs have been decommissioned and contributions made to the CSF instead — research infrastructure has been centralised. Academics now have access to a professionally-run, campus service, with all the benefits that brings.

Following the success of the service provided by the CSF, academics are now encouraged to make use of, and contribute to, other centralised research infrastructure run by IT Services:

The iCSF (aka Incline) — For interactive and GUI-based computation
Many research groups do their main computational work on the CSF, but, currently, purchase powerful workstations for development work and interactive use of GUI-based applications, and for visualization of data. Most such requirements can now be satisfied by use of Incline (the iCSF) in conjunction with the RVDS. The iCSF runs on the same financial model as the CSF: an initial seed has been made by The University and subsequent computational resources are contributed by research groups.

If you interested in finding out more about the iCSF, please contact the IT Services RI team: its-ri-team@manchester.ac.uk.

The Research Data Service — No more USB drives required!
Cheap-n-cheerful NAS boxes and USB drives are commonly used around campus to store important files and data. These systems take time to administer and backup; are less than resilient; and subject to theft. The RDS can satisfy the storage needs of most research groups: some storage is available free-at-the-point-of-use; those requiring greater capacity can pay for additional storage.

To find out more about the RDS, visit the RDS site, or contact the IT Services RI team: its-ri-team@manchester.ac.uk.

Research Virtual Machine Service — Web servers and more
Under-desk Web (and other) servers are frequently to be found in offices around campus. The planned RVMS will offer resilient, professionally-managed, central infrastructure which can be used to replace these. Crucially, academics will have administrator/root privileges if required.

Last modified on March 17, 2021 at 11:53 am by George Leaver