Known Issues and Workarounds – May 2024 upgrade
As part of the May 2024 CSF4 upgrade the Operating System was upgraded from CentOS 7.9 to Red Hat Enterprise Linux 9.3. As a result, there’s a possibility certain pieces of software will require re-installing, or perhaps additional modulefiles will need to be loaded in your jobscript.
We list below known problems and work-arounds discovered so far.
Your own testing of your jobs will help us. If you discover any such things in your own jobs, please do let us know!
How to report an issue
If you would like to report a problem with a piece of software following the May 2024 upgrade, please do ONE of the following (no need to do both).
- Fill out the following request form and do include the following.
- Request Type = Request access to /install software on HPC/HTC
- System in Use = CSF4
- Software (name & version) = Name of Software inc. Version
- Additional Information = In the body of text please outline the error, be sure to include any error messages, location of job scripts, relevant locations/directories, etc.
OR
- Send and email its-ri-team@manchester.ac.uk. In the subject line please include the the following,
- Subject – CSF4 – Software May2024 upgrade – nameOfSoftware + version“.
- Email body – In the body of text please outline the error, be sure to include any error messages, location of job scripts, relevant locations, etc.
Known Issues and Workarounds
sbatch: error: Batch job submission failed: Node count specification invalid
If you submit a multinode job using:
#!/bin/bash --login #SBATCH -p multinode #SBATCH -n 80 # 80 cores in total = 2 x 40-core compute nodes module load ...... mpirun -n $SLURM_NTASKS someapp.exe ######## THIS JOB WILL BE REJECTED #########
When you submit the job you will see an error:
sbatch myjobscript sbatch: error: Batch job submission failed: Node count specification invalid
The solution is to ALSO specify the number of nodes:
#!/bin/bash --login #SBATCH -p multinode #SBATCH -N 2 # 2 nodes #SBATCH -n 80 # 80 cores in total = 2 x 40-core compute nodes module load ...... mpirun -n $SLURM_NTASKS someapp.exe ######## THIS JOB WILL BE ACCEPTED #########
We are looking in to this behaviour change in SLURM.
version `XZ_5.2′ not found
flatpak: /opt/software/RI/apps/XZ/5.2.5-GCCcore-10.3.0/lib/liblzma.so.5: version `XZ_5.2' not found (required by /lib64/libarchive.so.13) flatpak: /opt/software/RI/apps/XZ/5.2.5-GCCcore-10.3.0/lib/liblzma.so.5: version `XZ_5.2' not found (required by /lib64/librpmio.so.9)
Known applications affected: GROMACS
Solution: Please load the following module file in any job scripts that report the above error.
module load xz/5.2.5-gcccore-10.3.0
Job email notifications not working
When adding lines such as the following to your jobscript, you won’t actually receive any email:
#SBATCH --mail-type=ALL #SBATCH --mail-user=firstname.lastname@manchester.ac.uk
There is no workaround – we will address this issue next week.
gview.exe: error while loading shared libraries: libGLU.so.1: cannot open shared object file: No such file or directory
Solution: Please load the following module to prevent this error:
14/06/2024: The libglu libraries are now installed on the login and compute nodes, so there is no need to load this modulefile.
module load libglu/9.0.1-gcccore-10.3.0
-bash: nano: command not found
Solution: The nano
editor has now been installed on the login nodes.
Licensing errors
Known applications affected:MATLBAB,StarCCM, other applications that contact on-campus license servers will more than likely be affected.
Example of errors
MATLAB License checkout failed. License Manager Error -15 Unable to connect to the license server. Check that the network license manager has been started, and that the client machine can communicate with the license server.
Currently under investigation
11/06/24 – root cause has been identified and a solution is being worked on.
16/06/24 – this issue has now been resolved – applications that use the campus license servers will now run.
Solution: Please just submit your jobs as usual – no changes are required to youor jobscripts.
Error Running Paraview Macro
HYDU_create_process (utils/launch/launch.c:73): execvp error on file srun (No such file or directory)
currently under investigation
Solution: Please just submit your jobs as usual – no changes are required to your jobscripts.
error while loading shared libraries: libnsl.so.1: cannot open shared object file: No such file or directory
Known applications affected:StarCCM
Solution: Missing dependency has been installed
flatpak: error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory
Solution: Please add the following to your jobscript, after loading your usual modulefiles:
module load openssl/1.0.2k
Solution: Please add the following additional modules in your jobscript, after loading your LAMMPS module, in the given order:
module load openssl/1.0.2k module load zlib/1.2.11-gcccore-9.3.0
CSF4 wiki on LAMMPS updated with this info.