Does your Software or Code Scale?

The HPC Pool is capable of running multi-node (usually MPI or mixed-mode MPI+OpenMP) parallel jobs, of size 128 to 1024 cores. Jobs must use multiples of 32-cores – i.e., fully populated compute-nodes.

Please note: Many well-known open source applications (e.g., chemistry apps) are already known to scale well. Also, if you have run your software on other HPC systems (for example ARCHER, N8HPC, JADE) then you don’t need to run timing tests on the CSF.

Strong Scaling and Weak Scaling

We will require some evidence that your application/code scales. By this, we mean:

  • Your application/code has been run in parallel using a small number of cores, perhaps on the CSF.
  • Timing how long the application (job) takes to run with an increasing number of cores will show whether it runs quicker as the number of cores increases (using the same inputs – e.g., the same dataset).
  • If the job is running quicker as you reach the maximum jobsize on the CSF (currently 120 cores) then it is suitable for testing on the HPC Pool.
  • Ideally, doubling the number of cores reduces the runtime by half, and so on. However, most software does not exhibit this ideal scaling. But the nearer an applications comes to ideal scaling the better.
  • Please supply timings of your code with an increasing number of cores, indicating which system you have run the tests on.

The above shows that the application exhibits strong scaling – it goes quicker as the number of cores increases for a fixed problem size.

  • Alternatively, if you wish to run larger simulations (e.g., to process larger datasets or with parameters that would make the job exceed the 7-day runtime limit on the CSF’s compute nodes) then you can do some scaling tests on the HPC Pool.
  • In this case the job may not go quicker but we would expect the runtime to remain approximately constant for O(N) algorithms (such as chemistry apps which only consider short-range forces) as the amount of data is increased proportionally to the number of cores (i.e., the data per core remains approximately constant.) Few applications achieve this ideal scaling but the closer your application comes to it the better.

The above shows that the application exhibits weak scaling – increasing the dataset/simulation size and increasing the cores keeps the runtime approximately constant.

Either of the above types of scaling would make your application/code suitable for testing/running in the HPC Pool. If an application shows no sign of running any faster with an increasing number of cores, or cannot process the larger problem sizes, then running it with a large number of cores in the HPC Pool would simply waste the resources in the HPC Pool.

Last modified on March 31, 2022 at 10:48 am by George Leaver