Slide Navigation

[[ Slide navigation:

Forwards:right arrow, space-bar or enter key
Reverse:left arrow




The Computational Shared Facility: Status and Strategy

Simon Hood

Project Manager & Technical Lead
Infra Ops, IT Services


Background and Context

Background and Context


Uni RC Strategy: Mi Whitepaper

  • Occasional whitepaper-related meetings since June 2008

Funding Model — IT Services-Run Comp. Shared-Fac.
  • One-off capital secured from centre: 90k
    • Cluster infra. (head nodes, storage, network h/w. . .)
  • All compute nodes must be paid for by research groups
    • With "tax" to contribute to future infrastructure
  • No contribution, (almost) no use!
From Many to One
  • Academics encouraged to buy in to central facility
    • strongly discouraged from buying own (small) clusters

Whitepaper published in latter part of 2009. . .



Current Status of the CSF


The Present

What do we have?
  • Adoption of the Redqueen model.
  • 90k sounds small. . .
  • . . .but, for the first time(?), both:
    • University political backing from the top
    • and University central IT support (esp. for dedicated network).

  • Much more than a replacement for Horace.
  • I'm very optimistic!


New System (Hardware)

Apps installed, testing finished, user accounts created:
  • 68TB parallel, high-perf scratch (Lustre)
  • 240 cores at 4GB/core
  • 48 cores at 8GB/core
  • 96 cores awaiting installation
Much more on its way. . .
  • 512 cores on order!
  • 96 cores to be ordered (Monday?)
  • More expected in Spring. . .

In Reynolds House. . .


New System (Apps and Users)

  • Apps installed
    • Gaussian 09, Amber. . .
    • Matlab, R. . .
    • CHARMM, NAMD, Polyrate. . .
  • Testing by users well underway
  • Awaiting registration system. . .


Pic 1


Pic 2


Pic 3


Who and What

Contributors thus far:

Chris Taylor20kImag., Gen. and Prot.
Mike Sutcliffe55kCEAS
Ian Hillier15kChemistry
Richard Bryce15kPharmacy
Peter Stansby/Colin Bailey125kMACE/EPS
School contribution32kCEAS

Upcoming (expected):

15kTranslational Med.
15kFLS (Bioinf)



Pooled purchasing clearly having the desired effect!

Dell provided loss-lead blade chassis (M1000) and two blades (M610x):

  • 12 cores, 48GB RAM,
  • plus one Nvidia 2070, each
. . .a very low price. . .


  • MBS likely to add to this very soon
  • Any one else? — coordinate to get max. discount
    • Email us!


Next Steps

Next Steps



Tightly-Integrate Clusters on Campus

Link private cluster networks (dedicated 10Gb links):
  • Share filesystems — easy to implement (with reqd h/w);
    • much better for users!
  • Ultimately, one "collective" instance of workload manager, Grid Engine (aka SGE)
  • No requirement for "grid" middleware
What? New System, RQ2 and Redqueen
  • Dedicated 10Gb link between Reynolds and Kilburn
  • Total ~2000 cores

. . .details/timescales over. . .



The first step in assimilation. . .


  • HEP, MACE, RGF, Chemistry. . .
  • Same machine room as CSF (Reynolds House)
  • 300 cores?
  • Filesystems shared in four to six weeks
    • collective SGE later



The second step will take longer. . .


  • ~800 cores; 16 Nvdia 2050s
  • MACE, Economics, SEAES (Atmos), EEE, Chemistry, MBS
  • Different machine room from CSF (Kilburn)
  • Awaiting dedicated 10Gb link between Reynolds and Kilburn
  • Filesystem upgrade
    • Has ~25TB storage — upgrade some to Lustre
  • Summer?


Man1, Man2 and The RGF

Can we use the Revolving Green Fund?

We hope:

  • Replace Man1/Weyl (5.5 yrs old) and Man2/Noether (5 yrs old)
  • with 96 new cores (two * C6100)
  • Meeting with RGF people next week. . .
    • if success: money available July/August


Phase Two: Cloud

The Cloud. . .


"Centralised, shared facility fits well with cloud computing model."

  • Plan to add features and access to CSF via "gateways". . .
  • CSF login nodes are . .


Web Portal

Database searches. . .

  • Some types of computational work are easily submitted via a Web interface
    • Bioinformatics community — string matching
  • IO virtual host (Ange/Owen/MikeBT)
  • Submits to CSF batch system in background


External Access

Basic, off-campus access. . .

  • Uni VPN not always the answer:
    • Non-UoM collaborators
    • If don't want home machine to have only a UoM IP
  • SSH gateway!
    • X509?


Virtual Desktop Service

Start interactive work at work, finish at home?

  • Supports the (increasing) interactive (GUI-based) use of HPC clusters
  • Bonus: eliminates one reason to leave office machines on at night/weekends. . .
  • VNC, RDP, FreeNX?


Condor Integration

Integrate with the other big computation resource on campus. . .

  • EPS Condor pool — expanding greatly as all(?) EPS public clusters now dual-boot Linux
  • Backfill the CSF with Condor. . .
    • Dedicated Condor gateway node
  • Web portal can submit to both dedicated hardware and Condor pools.