A user’s first experiences with the iCSF — first results
Dr Egor Zindy, Senior Experimental Officer: Bioimaging
My main job, as a Senior Experimental Officer for Bioimaging, is to help people analyse large sets of images or movies acquired on our microscopes. Typically, there are multiple conditions and multiple repeats to explore, and quite often these are processed independently. Also looking at how these images are processed, you will often find a loop scanning through all the pixels, processing one at a time, independently of all the others.
With my next task looming (tracking maggots in hundreds of movies, each consisting of thousands of frames), I am evaluating the possibility of using the university computer clusters rather than my own tired workstation. Fast forward to this week, I have *just* been given access to the iCSF cluster (aka INCLINE — the INteractive Computational LINux Environment), which is designed specifically for interactive computationally-intensive work.
The iCSF has three types of nodes. The biggest, incline2000
, has 40 CPU cores and 2 TB of RAM. I used incline2000 for my tests. I designed a quick test to get a feel for “how fast” these machines are. I used a contrast enhancement method called weighted region ranking (WRR), which I have optimised to run on multiple cores using OpenMP.
The first result of this experiment was that running the algorithm on a single core was 25% slower than running it on a single core of my desktop machine. [[Ed: incline2000 contains Intel Westmere CPUs. These are slower than newer IvyBridge and Haswell CPUs.]] I think this is a very important result, as people may be disappointed if the software they try to run on these “big iron” machines is not in some way tailored to take advantage of a large number of cores — on incline2000 there are 40 cores available. And this is where incline2000 really shines. Using all the cores (40 on incline2000 versus 12 on my workstation), my WRR implementation ran about 4 times faster on the iCSF.
With these very preliminary results, this is where I’m at. I still need to optimise my code for the iCSF, and better understand how to balance the number of cores used in my process, versus running multiple instances of the code on separate images, versus profiling and optimising parts of my code to fully take advantage of these amazing machines. In any case, a very encouraging result!
[[Ed.: The iCSF is ideal for development work of this kind. But for running lots of “production” jobs to process hundreds or thousands of images, the CSF is better. Users can run hundreds of jobs at the same time there.]]