HDF5
Overview
HDF5 (Hierarchical Data Format) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of large complex datasets.
Only the HDF5 standard (not the much older HDF4 standard) is available on CSF3.
The following versions are available on CSF3 (see the modulefiles below for more information): 1.10.4, 1.10.4 (MPI I/O enabled), 1.10.1, 1.8.21, 1.8.16. All versions are serial (non-MPI) unless otherwise stated.
Restrictions on use
There are no restrictions on access to the HDF5 libraries on CSF. The software is released under a BSD-style Open Source license and all use must adhere to that license. Please consult the HDF5 license for more information.
Set up procedure
We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.
Load one of the following modulefiles to set up your environment:
# Includes MPI I/O support and has the C++ interface compiled module load libs/intel-18.0/hdf5/1.10.5_mpi # Intel 18.0.3 compiler, OpenMPI 4.0.1 # Includes MPI I/O support but does not have the C++ interface compiled (a limitation of HDF5) module load libs/intel-17.0/hdf5/1.10.5_mpi # Intel 17.0.7 compiler, OpenMPI 3.1.3 # Includes MPI I/O support but does not have the C++ interface compiled (a limitation of HDF5) module load libs/gcc/hdf5/1.10.4_mpi # Uses system default GCC 4.8.5 compiler, OpenMPI 3.1.3 # No MPI I/O support but does have the C++ interface compiled. module load apps/hdf5_serial/1.10.1 # Uses system default GCC 4.8.5 compiler module load libs/gcc/hdf5/1.10.4 # Uses system default GCC 4.8.5 compiler module load libs/gcc/hdf5/1.8.21 # Uses system default GCC 4.8.5 compiler module load libs/gcc/hdf5/1.8.16 # Uses system default GCC 4.8.5 compiler
Compiling an HDF5-capable application
You will mostly use the HDF5 installation on CSF3 when compiling your own software. This allows you to add HDF5 functionality to your own apps (they’ll read and write HDF5 files). There are some tools you can also run to process existing HDF5 data files (see below).
The modulefiles will set an environment variable named ${HDF5DIR}
which can then be used in your compilation process (e.g., in a Makefile or directly on the command-line) to access the header and library files:
- To inform the compiler of the header file directory use:
gcc -I${HDF5DIR}/include ....
- To inform the compiler of the library files use:
gcc ... -L${HDF5DIR}/lib -lhdf5
- In a Makefile ensure you use
${HDF5DIR}
rather than$HDF5DIR
.
An example compilation command could be
gcc -I${HDF5DIR}/include ex_hdf5.c -o ex_hdf5 -L${HDF5DIR}/lib -lhdf5 # # Use mpicc if compiling MPI code that uses the hdf5 library
Running an HDF5-capable application
You must load the HDF5 modulefile before running your HDF5-capable application (unless you have statically linked your code against the HDF5 libraries).
Please do not run HDF5-capable applications on the login node. Jobs should be submitted to the compute nodes via batch.
Serial batch job submission
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory # NO -V line - we load modulefiles in the jobscript # Load the serial (non-MPI) modulefile module load libs/gcc/hdf5/1.10.4 # Run my application I compiled earlier ./myhdf5app arg1 ...
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Parallel batch job submission
Create a batch submission script (which will load the modulefile in the jobscript), for example:
#!/bin/bash --login #$ -cwd # Job will run from the current directory #$ -pe smp.pe 8 # An example single-node multi-core job using 8 cores # Load the parallel (MPI) modulefile module load libs/gcc/hdf5/1.10.4_mpi # Run my application I compiled earlier using mpirun to run it in parallel. # $NSLOT will be set to the number of cores requested above. mpirun -n $NSLOTS ./myparhdf5app arg1 ...
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
HDF5 Tools
The HDF5 ‘bin’ directory is added to your path so that you can access the h5
tools. Use ls ${HDF5BIN}
to see all binary tools which can be used on HDF5 files:
# HDF5 tools - must be used in batch jobs, as above, to process large HDF5 data files! gif2h5 h5debug h5import h5pcc h5redeploy h5unjam h52gif h5diff h5jam h5perf h5repack h5watch h5clear h5dump h5ls h5perf_serial h5repart ph5diff h5copy h5format_convert h5mkgrp h5pfc h5stat
Further info
Example projects are available in $HDF5DIR/share/hdf5_examples/
See the HDF5 website for full documentation.
Updates
None.