The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead. To display this old CSF2 page click here. |
History, Necessity and Sustainability of the CSF
History
The motivation for building a shared HPC platform for and at the University of Manchester, largely funded by financial contributions from research groups, originated from gatherings during 2009 organised by Manchester Informatics.
This idea built on the success of a shared HPC cluster Redqueen, which bootstrapped itself into being through an initial procurement of just 14k from (the now defunct) RCS and the good will and faith of the School of MACE during 2008. (This led to subsequent contributions from the Schools of Economics, Atmospheric Physics, EEE, MBS and Chemistry.)
Mi Whitepaper
After serveral gatherings organised by Manchester Informatics, a whitepaper was written which proposed an initial seed investment from the University and that thereafter the Computational Shared Facility, as it became known, be self-sustaining.
Initial Procurement (December 2010)
The initial procurement comprised a seed contribution from the University of 90k and academic contributions from Professor Chris Taylor, Professor Mike Sutcliffe and Ian Hillier also totalling 90k.
Subsequent Procurements
The CSF has continued to grow from a series of contributions. To date (May 2015):
- contributions to the CSF total just over two million pounds, with a further procurements to be made in the next few months
- come from 30 different research “groups”, of which three are effectively school-wide contributions.
Current Contributors Summary
A complete list of contributions to the CSF is given on the CSF Dashboard.
Sustainability
The Computational Shared Facility operates — by design — on a completely sustainable model:
- all computational hardware (compute nodes, fast interconnect, GPGPUs, etc) is funded solely by academic contributors;
- such contributed hardware is purchased with a three-year warranty and therefore has a guaranteed minimum life of three years;
- efforts are made to keep all contributed hardware in production for a further two years (total of five years), where practical — it is expected that most hardware will have a five-year life;
- contributed hardware is decomissioned after five years;
- the University funds required infrastructure (e.g., login nodes, fileservers, network switches).
Necessity
The CSF has become deeply integrated into the work of many computational researchers at the University of Manchester.
A Replacement for a Profusion of Beowulfs
The CSF exists as replacement for the profusion of small Linux-based “Beowulf” clusters that were growing all over campus until December 2010. In that, the CSF is a great success — far more than anyone could have reasonably supposed. That success is built on doing exactly what it says on the tin: being a replacement for research group-specific HPC clusters (and not trying to be a local version of a national service). The benefits of the CSF over the previous “model” are numerous; most importantly:
- Merging what would otherwise be separate, smaller, computational resources, means that CPU cycles are not wasted — when a research group has a lot of computational work to perform, their usage is not limited to the capacity of the hardware contributed, rather they, can use others’ spare cycles to increase their throughput of jobs.
- The CSF offers a professionally managed service to University computational researchers not previously available (at this scale).
A return to the anarchic days in which each research group operated independently is unthinkable.
A Local Service for Local Users
A truly local service offers great advantages over the use of national services for the majority:
- A great deal of emphasis is placed on face-to-face support for inexperienced users. This is hugely important for building the confidence of those individuals and groups new to a shared cluster environment — use of remote computational resource via a batch system — who would otherwise not attempt to use a central resource, or revert quickly to their far less productive local office environment. (National services, by their very nature, are geared towards more experienced users.)
- The CSF can offer a far more flexible service — while attempting to offer a homogeneous environment for users, the CSF can accept relatively small, custom hardware contributions from research groups who have novel requirements; we are looking at ways for researchers to use the CSF for interactive, development work so that the funds currently spent on the many high-end workstations to be found across campus can be spent more efficently and effectively as a contribution to the CSF.
- The CSF can be used for local teaching and, for example, EPSRC-funded summer schools for postgraduate students, through interactive, GUI-based use of the facility, over the fast campus network (e.g., for OpenGL-based 3D work).
Licensing
Many commercial applications which are heavily used at UoM cannot be installed on a regional service because of legal/licensing issues.
Big Data
The CSF is central to the nascent (UoM campus) Research Data Network.
Many CSF contributers/users frequently and repeatedly need to upload large quantities of experimental data to, or download similar quantities of computationally-generated data from, the facility. If the standard campus network cannot accommodate such large uploads/downloads, the Research Data Network can. (The number of contributors/users with such requirements is growing as FLS and MHS come on board.)