Slide Navigation






[[ Slide navigation:

Forwards:right arrow, space-bar or enter key
Reverse:left arrow

]]    


The Computational Shared Facility Update
2

 

Introductions

The Computational Shared Facility: 2011 June Update


Simon Hood

simon.hood@manchester.ac.uk

CSF Project Manager & Technical Lead, Infrastructure and Operations, IT Services


The Computational Shared Facility Update
3

 

Status

Current Status of the CSF


The Computational Shared Facility Update
4

 

System Hardware

What hardware do we have?

Storage
  • 68TB parallel, high-performance (parallel) scratch (Lustre)
  • 18TB home space (fully backed up)
Compute
  • 512 cores at 2GB/core — with Infiniband (fast i/c)
  • 288 cores at 4GB/core
  • 48 cores at 8GB/core
  • 240 cores awaiting installation (rackspace only available last week)
GPGPUs
  • Nvidia GPGPUs — later!


The Computational Shared Facility Update
5

 

System Software

What software do we have?

Applications installed so far:
  • Gaussian 09, Amber, Pcazip
  • Matlab, Octave, R
  • CHARMM, DL-Poly, Gromacs, Maestro, Molden, NAMD, OpenEye, Polyrate, PyMOL, VMD
  • Code Saturne, Fluent
  • Grace
  • Freesurfer

Do let us know if you need something else.


The Computational Shared Facility Update
6

 

Contributions 1/2

Contributions thus far:

Chris Taylor20kImag., Gen. and Prot.
Mike Sutcliffe55kCEAS
Ian Hillier15kChemistry
Richard Bryce15kPharmacy
School contribution125kMACE
Paola Carbone + school contrib.45kCEAS
Simon Lovell and Simon Whelan15kBioinform. (FLS)
Jane Worthington15kTranslational Med.

Amounts contributed and source, to date.

. . .Total: 305k


The Computational Shared Facility Update
7

 

Contributions 2/2

Upcoming Contributions

Neil BurtonChemistry
Nick HighamMathematics
Richard BrycePharmacy
Ben RodgersMACE
Paul GrassiaCEAS
Stephen WelbournePsychology
(Ser-Huang Poon)(MBS)

Confirmed and (expected) contributions, coming r-s-n.

. . .Total: 120k


The Computational Shared Facility Update
8

 

Jobs and Accounting

Jobs and Accounting


The Computational Shared Facility Update
9

 

ARCo

ARCo: The SGE Database


The Computational Shared Facility Update
10

 

Share Usage

CSF Very Unevenly Used by Contributors


The Computational Shared Facility Update
11

 

CSF Policies

CSF Policies


The Computational Shared Facility Update
12

 

External Access

Can my collabs. from outside of Manchester use the CSF?

Collaborators:


The Computational Shared Facility Update
13

 

Queue Length

Queues on the CSF have Wall Clock Limit of One Week — Except Short Queues


The Computational Shared Facility Update
14

 

Job Scheduling

Computational Resources Allocated via Fair Share on the CSF

Example

simonh has jumped to the front of the queue:

  4550  0.1050  priority-t  simonh    qw  05/28/11 13:16  666
  4514  0.0144  amb11.scri  mophjaa3  qw  05/27/11 10:29  64        
  4528  0.0066  oam.close_  mjkssjp3  qw  05/27/11 14:01  12        
  4529  0.0066  oam.close_  mjkssjp3  qw  05/27/11 14:11  12        
  4530  0.0066  M-10-10     mjkijsn2  qw  05/27/11 15:07  12        


The Computational Shared Facility Update
15

 

Users' Share within Projects

Is one user hogging a project's share?

Default SGE Project
  • Users submitting to a project contribute only to the accumulated resource consumption of that project.
  • Entitlements of project users are not managed.
Second Level of Fairshare
  • All users submitting to a project get equal long-term entitlements. . .
    • . . .or unequal entitlements, if so specified.

Let us know how you want your users managed.


The Computational Shared Facility Update
16

 

Cost of a Contribution

How much is a contrib? What do I have to pay for?

Cost of STD contribution (C6100, i.e., four nodes):
  • Hardware, installation and support: ~12.5k + VAT
  • Internal networks: 385 pounds
  • Internal filesystems:
    • 500 pounds for 250GB home (backed up)
      • plus share of scratch
    • 200 pounds per additional 100GB home
  • Rack and PDU: approx 400 — in future?


The Computational Shared Facility Update
17

 

Centre's Contribution

What is The Centre paying for?

Costs, per C6100:


The Computational Shared Facility Update
18

 

Soft Landing

Am I kicked off of the CSF after five years?

Answers:


The Computational Shared Facility Update
19

 

Minimum Contribution

What if I have only 8k to spend?


The Computational Shared Facility Update
20

 

What Constitutes a Share?

Scheduling based on shares — but what is a share?

Options

Questions

Share based on financial contribution?


The Computational Shared Facility Update
21

 

Current Developments

Current Developments


The Computational Shared Facility Update
22

 

RedQueen2

RQ2 is Being Integrated into the CSF


The Computational Shared Facility Update
23

 

GPGPUs 1/3

Nvidia GPGPUs in the CSF

These to be installed over the next few weeks (resources allowing):

MACE Contribution
  • Four S2050s (16 * 2050 cards)
    • . . .moved from Redqueen. . .
Dell Seed Hardware
  • M1000e blade chassis
  • Two M610x blades, each with one Nvidia M2070


The Computational Shared Facility Update
24

 

GPGPUs 2/3

Dell Seed Hardware


Dell blade system: M1000e enclosure and M610x blade.


The Computational Shared Facility Update
25

 

GPGPUs 3/3

Infiniband
  • All hosting CPUs of these GPGPUs are connected via Infiniband
    • Two levels of parallelism
    • Working code: Daniel Valdez Balderas
Further contributions? Six slots available.
  • MBS
  • RGF?
  • Who else?


The Computational Shared Facility Update
26

 

How to Use the System

How to Use the System


The Computational Shared Facility Update
27

 

Running Jobs on the Login Node

I want my jobs to run now!


The Computational Shared Facility Update
28

 

Home vs. Scratch

Scratch is faster than home!


The Computational Shared Facility Update
29

 

SGE PEs 1/2

I don't care where it runs. . .

Example:

  qsub my-mpi-job.sh
where my-mpi-job.sh
  #!/bin/bash

  #$ -S /bin/bash
  #$ -cwd
  #$ -pe orte* 64

  # ...set environment here, e.g., mpi/gcc/openmpi/1.4.3...

  mpirun -np $NSLOTS ./my-mpi-prog


The Computational Shared Facility Update
30

 

SGE PEs 2/2

I want to use the Infiniband-connected nodes!

Example:

  qsub my-mpi-job.sh
where my-mpi-job.sh
  #!/bin/bash

  #$ -S /bin/bash
  #$ -cwd
  #$ -pe orte-32.pe  

  # ...set environment here, e.g, mpi/gcc/openmpi/1.4.3-ib
  #                               mpi/intel-11.1/openmpi/1.4.3-ib...

  mpirun -np $NSLOTS ./my-mpi-prog


The Computational Shared Facility Update
31

 

Next Steps

Next Steps


The Computational Shared Facility Update
32

 

Integration of System(s) in Kilburn

Sharing of CSF Filesystems; Integration of Compute

Link private cluster networks via dedicated 10Gb links:
  • Share filesystems — easy to implement (with reqd. h/w);
    • much better for users!
  • Ultimately, one "collective" instance of workload manager, Grid Engine (aka SGE)
    • No requirement for "grid" middleware
Redqueen
  • Redqueen: prototype contribution cluster
Templar?
  • FLS contribution cluster — share filesystems?


The Computational Shared Facility Update
33

 

Man1, Man2 and The RGF

Use the Revolving Green Fund to Replace Old Systems

Hardware to be replaced. . .
  • Replace Man1/Weyl and Man2/Noether
. . .with new hardware for the CSF:
  • Two C6100s plus two M610x with S2070?
Timescales
  • If bid successful, money available July/August


The Computational Shared Facility Update
34

 

Finally

Finally. . .


The Computational Shared Facility Update
35

 

Talk to Us!

Talk to us!

If. . .

— or we have got something right!

its-research@manchester.ac.uk

Send us constructive suggestions. . .


Page Contents:


Status


Who is Running What?


CSF Policies


Current Developments


How to Use the System


Next Steps


Finale

[raiser] [escape] [lower]

Contents: