Known Issues and Workarounds – May 2024 upgrade

As part of the May 2024 CSF4 upgrade the Operating System was upgraded from CentOS 7.9 to Red Hat Enterprise Linux 9.3. As a result, there’s a possibility certain pieces of software will require re-installing, or perhaps additional modulefiles will need to be loaded in your jobscript.

We list below known problems and work-arounds discovered so far.

Your own testing of your jobs will help us. If you discover any such things in your own jobs, please do let us know!

How to report an issue

If you would like to report a problem with a piece of software following the May 2024 upgrade, please do ONE of the following (no need to do both).

  1. Fill out the following request form and do include the following.
    • Request Type = Request access to /install software on HPC/HTC
    • System in Use = CSF4
    • Software (name & version) = Name of Software inc. Version
    • Additional Information = In the body of text please outline the error, be sure to include any error messages, location of job scripts, relevant locations/directories, etc.

    OR

  2. Send and  email its-ri-team@manchester.ac.uk. In the subject line please include the the following,
    • Subject – CSF4 – Software May2024 upgrade – nameOfSoftware + version.
    • Email body – In the body of text please outline the error, be sure to include any error messages, location of job scripts, relevant locations, etc.

Known Issues and Workarounds

sbatch: error: Batch job submission failed: Node count specification invalid

If you submit a multinode job using:

#!/bin/bash --login
#SBATCH -p multinode
#SBATCH -n 80            # 80 cores in total = 2 x 40-core compute nodes
module load ......
mpirun -n $SLURM_NTASKS someapp.exe

######## THIS JOB WILL BE REJECTED #########

When you submit the job you will see an error:

sbatch myjobscript
sbatch: error: Batch job submission failed: Node count specification invalid

The solution is to ALSO specify the number of nodes:

#!/bin/bash --login
#SBATCH -p multinode
#SBATCH -N 2             # 2 nodes 
#SBATCH -n 80            # 80 cores in total = 2 x 40-core compute nodes
module load ......
mpirun -n $SLURM_NTASKS someapp.exe

######## THIS JOB WILL BE ACCEPTED #########

We are looking in to this behaviour change in SLURM.

version `XZ_5.2′ not found

flatpak: /opt/software/RI/apps/XZ/5.2.5-GCCcore-10.3.0/lib/liblzma.so.5: version `XZ_5.2' not found (required by /lib64/libarchive.so.13)
flatpak: /opt/software/RI/apps/XZ/5.2.5-GCCcore-10.3.0/lib/liblzma.so.5: version `XZ_5.2' not found (required by /lib64/librpmio.so.9)

Known applications affected: GROMACS

Solution: Please load the following module file in any job scripts that report the above error.

module load xz/5.2.5-gcccore-10.3.0

Job email notifications not working

When adding lines such as the following to your jobscript, you won’t actually receive any email:

#SBATCH --mail-type=ALL
#SBATCH --mail-user=firstname.lastname@manchester.ac.uk

There is no workaround – we will address this issue next week.

Gaussview gview.exe: libGLU.so.1: cannot open shared object file

gview.exe: error while loading shared libraries: libGLU.so.1: cannot open shared object file: No such file or directory

Solution: Please load the following module to prevent this error:
14/06/2024: The libglu libraries are now installed on the login and compute nodes, so there is no need to load this modulefile.

module load libglu/9.0.1-gcccore-10.3.0

-bash: nano: command not found

Solution: The nano editor has now been installed on the login nodes.

Licensing errors

Known applications affected:MATLBAB,StarCCM, other applications that contact on-campus license servers will more than likely be affected.
Example of errors

MATLAB
License checkout failed.
License Manager Error -15
Unable to connect to the license server. 
Check that the network license manager has been started, and that the client machine can communicate
with the license server.

Currently under investigation
11/06/24 – root cause has been identified and a solution is being worked on.
16/06/24 – this issue has now been resolved – applications that use the campus license servers will now run.
Solution: Please just submit your jobs as usual – no changes are required to youor jobscripts.

Error Running Paraview Macro

HYDU_create_process (utils/launch/launch.c:73): execvp error on file srun (No such file or directory)

currently under investigation
Solution: Please just submit your jobs as usual – no changes are required to your jobscripts.

libnsl.so.1: cannot open shared object file

error while loading shared libraries: libnsl.so.1: cannot open shared object file: No such file or directory

Known applications affected:StarCCM

Solution: Missing dependency has been installed

libssl.so.10: cannot open shared object file

flatpak: error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory

Solution: Please add the following to your jobscript, after loading your usual modulefiles:

module load openssl/1.0.2k

LAMMPS libssl.so.10, libcrypto.so.10, libz.so.1 : not found / cannot open shared object file

Solution: Please add the following additional modules in your jobscript, after loading your LAMMPS module, in the given order:

module load openssl/1.0.2k
module load zlib/1.2.11-gcccore-9.3.0

CSF4 wiki on LAMMPS updated with this info.

Last modified on June 20, 2024 at 12:36 pm by Abhijit Ghosh