Research Infrastructure > CSF2 (retired) > Software > Tools > pigz and unpigz

- Recent Posts & Updates
- View all...

Page Contents

The CSF2 has been replaced by the CSF3 - please use that system! This documentation may be out of date. Please read the CSF3 documentation instead.
To display this old CSF2 page click here.

pigz and unpigz

Overview

Pigz is a parallel version of gzip, a tool to compress/uncompress files in to gzip or zip archives. It uses multiple cores on a compute node to speed up file compression.

Note that the archive (.gz) files written by pigz and ordinary gzip installed on many Linux systems are compatible. You do not need to compress and uncompress a file using pigz and unpigz. For example you can compress a file quickly on the CSF using pigz (see below), then transfer the file to your local desktop machine and uncompress it using the ordinary gunzip command. Conversely if you download a data file from the web that has been compressed with gzip you can uncompress it on the CSF using unpigz.

Under no circumstances should pigz be run on the login node. If found running it will be killed without warning. It must be submitted as a batch job.

Version 2.3.3 is installed on the CSF.

Restrictions on use

Under no circumstances should pigz be run on the login node. If found running it will be killed without warning. It must be submitted as a batch job.

There are no restrictions on accessing pigz on the CSF.

Set up procedure

Under no circumstances should pigz be run on the login node. If found running it will be killed without warning. It must be submitted as a batch job.

To access the software you must first load the modulefile:

module load tools/gcc/pigz/2.3.3

Running the application

Please do not run pigz on the login node. Jobs should be submitted to the compute nodes via batch.

Parallel batch job submission – file compression

It is recommended you run pigz on files in your scratch area. This is a faster filesystem than your home area:

cd ~/scratch

Make sure you have the modulefile loaded then create a batch submission script, for example:

#!/bin/bash
#$ -S /bin/bash
#$ -cwd             # Job will run from the current directory
#$ -V               # Job will inherit current environment settings
#$ -pe smp.pe 8     # Number of cores to use for file compression/decompression

### Some example compression uses are given below ###

## Note that $NSLOTS is automatically set to the number of cores requested above

## Compress a file named mydatafile.dat - it will be renamed mydatafile.dat.gz once compressed
pigz -p $NSLOTS mydatafile.dat

## OR Compress everything found in a directory named 'my_data' to a compressed tar file named my_data.tar.gz
tar cf - my_data | pigz -p $NSLOTS > my_data.tar.gz
       #
       #
       # Note that a '-' here means the output is sent through the
       # pipe (the | symbol) to the pigz command, not to an intermediate
       # tar file.

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job – decompression

The pigz manual states:

Decompression can’t be parallelized. As a result, pigz uses a single thread (the main thread) for decompression, but will create three other threads for reading, writing, and check calculation, which can speed up decompression under some circumstances. Parallel decompression can be turned off by specifying one process ( -dp 1 or -tp 1 ).

Hence when using unpigz you should request 4 cores unless you turn off parallel decompression.