HDF5

Overview

HDF5 (Hierarchical Data Format) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of large complex datasets.

Only the HDF5 standard (not the much older HDF4 standard) is available on CSF3.

The following versions are available on CSF3 (see the modulefiles below for more information): 1.10.4, 1.10.4 (MPI I/O enabled), 1.10.1, 1.8.21, 1.8.16. All versions are serial (non-MPI) unless otherwise stated.

Restrictions on use

There are no restrictions on access to the HDF5 libraries on CSF. The software is released under a BSD-style Open Source license and all use must adhere to that license. Please consult the HDF5 license for more information.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles to set up your environment:

# Includes MPI I/O support and has the C++ interface compiled
module load libs/intel-18.0/hdf5/1.10.5_mpi   # Intel 18.0.3 compiler, OpenMPI 4.0.1

# Includes MPI I/O support but does not have the C++ interface compiled (a limitation of HDF5)
module load libs/intel-17.0/hdf5/1.10.5_mpi   # Intel 17.0.7 compiler, OpenMPI 3.1.3

# Includes MPI I/O support but does not have the C++ interface compiled (a limitation of HDF5)
module load libs/gcc/hdf5/1.10.4_mpi          # Uses system default GCC 4.8.5 compiler, OpenMPI 3.1.3

# No MPI I/O support but does have the C++ interface compiled.
module load apps/hdf5_serial/1.10.1           # Uses system default GCC 4.8.5 compiler

module load libs/gcc/hdf5/1.10.4              # Uses system default GCC 4.8.5 compiler
module load libs/gcc/hdf5/1.8.21              # Uses system default GCC 4.8.5 compiler
module load libs/gcc/hdf5/1.8.16              # Uses system default GCC 4.8.5 compiler

Compiling an HDF5-capable application

You will mostly use the HDF5 installation on CSF3 when compiling your own software. This allows you to add HDF5 functionality to your own apps (they’ll read and write HDF5 files). There are some tools you can also run to process existing HDF5 data files (see below).

The modulefiles will set an environment variable named ${HDF5DIR} which can then be used in your compilation process (e.g., in a Makefile or directly on the command-line) to access the header and library files:

  • To inform the compiler of the header file directory use:
    gcc -I${HDF5DIR}/include ....
  • To inform the compiler of the library files use:
    gcc ... -L${HDF5DIR}/lib -lhdf5
  • In a Makefile ensure you use ${HDF5DIR} rather than $HDF5DIR.

An example compilation command could be

gcc -I${HDF5DIR}/include ex_hdf5.c -o ex_hdf5 -L${HDF5DIR}/lib -lhdf5
 #
 # Use mpicc if compiling MPI code that uses the hdf5 library

Running an HDF5-capable application

You must load the HDF5 modulefile before running your HDF5-capable application (unless you have statically linked your code against the HDF5 libraries).

Please do not run HDF5-capable applications on the login node. Jobs should be submitted to the compute nodes via batch.

Serial batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# Load the serial (non-MPI) modulefile
module load libs/gcc/hdf5/1.10.4

# Run my application I compiled earlier
./myhdf5app arg1 ...

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

Create a batch submission script (which will load the modulefile in the jobscript), for example:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
#$ -pe smp.pe 8     # An example single-node multi-core job using 8 cores

# Load the parallel (MPI) modulefile
module load libs/gcc/hdf5/1.10.4_mpi

# Run my application I compiled earlier using mpirun to run it in parallel.
# $NSLOT will be set to the number of cores requested above.
mpirun -n $NSLOTS ./myparhdf5app arg1 ...

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

HDF5 Tools

The HDF5 ‘bin’ directory is added to your path so that you can access the h5 tools. Use ls ${HDF5BIN} to see all binary tools which can be used on HDF5 files:

# HDF5 tools - must be used in batch jobs, as above, to process large HDF5 data files!

gif2h5   h5debug           h5import  h5pcc          h5redeploy  h5unjam
h52gif   h5diff            h5jam     h5perf         h5repack    h5watch
h5clear  h5dump            h5ls      h5perf_serial  h5repart    ph5diff
h5copy   h5format_convert  h5mkgrp   h5pfc          h5stat

Further info

Example projects are available in $HDF5DIR/share/hdf5_examples/

See the HDF5 website for full documentation.

Updates

None.

Last modified on July 4, 2022 at 2:35 pm by Pen Richardson