Click [slideshow] to begin presentation.



Research Computing Infrastructure

Simon Hood

Project Manager & Technical Lead Computational Shared Facility
Infra Ops, IT Services



Research Computing

What do I mean by research computing?

  • HPC! (parallel)
    • Also: HTC, HPV, high-mem, huge IO...



Background: Uni Research Computing Strategy

Some context. . .


Recent Past Uni. HPC

Meagre HPC facilities:
  • At "merger" (2004): Cosmos, Eric and Bezier (totalling ~80 cores).
  • CSAR machines — national, not local, service.
  • Horace (2006 – 2010) — leased by RCS (~200 cores).
Research Computing Strategy
  • ???



Summer 2008, Development One

Ad hoc contribution machine:

  • Started by RCS with 14k left in a budget (spend it or lose it)
    • Three servers, a switch and a rack
  • No official backing
  • Two and a half years:
    • hand-to-mouth existence
    • 300k(?)
    • ~800 cores (MACE, Economics, SEAS, EEE, MBS, Chemistry)


Manchester Informatics

Summer 2008, Development Two

  • Home Page:
  • A computational research community
  • Chris Taylor (Associate Vice President for Research, UoM)
    • Mike Sutcliffe (HoD CEAS)
    • Carmel Dickenson (Programme Manager)

  • Occasional whitepaper-related meetings since June 2008

Whitepaper published in latter part of 2009. . .


Mi W/paper 1/2: Comp. Res. Themes

Mi: A Computational Research Umbrella

  • Mi provides research computing strategy and governance
  • Themes:
    • Nuclear power, earth systems, aerospace
    • Finance and economics, health and lifescience


Mi W/paper 2/2: Comp Shared-Fac

Funding Model — IT Services-Run Comp. Shared-Fac.
  • One-off capital secured from centre: 90k
    • Cluster infra. (head nodes, storage, network h/w. . .)
  • All compute nodes must be paid for by research groups
    • With "tax" to contribute to future infrastructure
  • No contribution, no use!
From Many to One
  • Academics encouraged to buy in to central facility
    • strongly discouraged from buying own (small) clusters

Campus (Research Computing) Cloud Project

This is Phase One of the Campus Cloud project. . .


The Present

The Present — Summary

What do we have?

  • Adoption of the Redqueen model.
  • 90k sounds small. . .
  • . . .but, for the first time(?), both
    • University political backing from the top
    • and University central IT support (esp. for dedicated network).
  • Much more than a replacement for Horace.
  • I'm optimistic!


New System 1/2

  • Reynolds House
  • 68TB parallel, high-perf scratch (Lustre)
  • 240 cores at 4GB/core; 48 cores at 8GB/core
  • 96 cores awaiting installation
Software and Users:
  • Apps installed
  • Testing by users underway
  • Awaiting registration system. . .

. . .continued. . .


New System 2/2

Much more on its way. . .
  • 512 cores on order:
    • 125k from Colin Bailey/Peter Stansby for Modelling and Simulation Centre
  • More coming:
    • MHS: 48 cores; FLS: 48 cores; Maths: 96(?) cores
    • Dell(!): 96 cores with two Nvidia M2070 cards
  • All customers want Linux;
  • possible virtual support of legacy OS.


Tight Integration

Phase 1.5: Tightly-Integrate Clusters on Campus

Dedicated 10Gb Networks
  • Merge private cluster networks
  • Share filesystems — easy to implement (with reqd h/w), huge productivity increase for users
  • One "collective" instance of workload manager, Grid Engine ("SGE")
What? Redqueen and New System (and RQ2)
  • First, new system and RQ2 (both in Reynolds House)
  • Secondly, Redqueen (Kilburn)
    • dedicated 10Gb link between Reynolds and Kilburn
  • Total ~2000 cores


Phase Two: Cloud

The Cloud. . .


"Centralised, shared facility fits well with cloud computing model."

  • Plan to add features and access to CSF via "gateways". . .
  • CSF login nodes are . .


Web Portal

Database searches. . .

  • Some types of computational work are easily submitted via a Web interface
    • Bioinformatics community — string matching
  • IO virtual host (Ange/Owen/MikeBT)
  • Submits to CSF batch system in background


External Access

Basic, off-campus access. . .

  • Uni VPN not always the answer:
    • Non-UoM collaborators
    • If don't want home machine to have only a UoM IP
  • SSH gateway!
    • X509?


Virtual Desktop Service

Start interactive work at work, finish at home?

  • Supports the (increasing) interactive (GUI-based) use of HPC clusters
  • Bonus: eliminates one reason to leave office machines on at night/weekends. . .
  • VNC, RDP, FreeNX?


Condor Integration

Integrate with the other big computation resource on campus. . .

  • EPS Condor pool — expanding greatly as all(?) EPS public clusters now dual-boot Linux
  • Backfill the CSF with Condor. . .
    • Dedicated Condor gateway node
  • Web portal can submit to both dedicated hardware and Condor pools.