Anaconda Python

June 2023: The proxy is no longer available.
It will not be possible to download packages from external sites (e.g., when creating a conda env) on CSF4. Please consider using your CSF3 account for this task. See the CSF3 qrsh notes for more information.

Overview

Anaconda is a completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing general-purpose statistical software package. It has 100+ of the most popular Python packages for science, math, engineering, data analysis.

Versions available are listed below in ‘Set up Procedure’.

Restrictions on use

There are no restrictions on access Anaconda Python on the CSF. All users should read the End User License Agreement before using the software.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit the settings.

Load one of the following modulefiles (most recent listed at the top, oldest at the bottom):

# Python 3.x
module load anaconda3/2022.10          # Python 3.9.13
module load anaconda3/2020.07          # Python 3.8.3
module load anaconda3/5.3.0            # Python 3.7.0
module load anaconda3/5.2.0            # Python 3.6.5

# Python 2.x
module load anaconda2/5.3.0            # Python 2.7.15

Additional Centrally Installed Packages

Please note that on CSF4 we will not be installing additional packages in to Anaconda Python. Instead all users are encourages to use conda environments to install packages in self-contained environments within your home directory (see below for details). This is because Anaconda python uses its own package manager which may conflict with the package manager we are using on CSF4 for app installs.

We may be able to install additional python packages against the non-anaconda ordinary python installations. For example, the following modulefiles are available to provided non-anaconda python:

# Some of the non-anaconda python installations
module load python/3.8.2-gcccore-9.3.0
module load python/3.7.4-gcccore-9.3.0
module load python/2.7.18-gcccore-9.3.0

A full list of python related modulefiles is available by running:

module avail python

Running the application

Please do not run python on the login node. Jobs should be submitted to the compute nodes via batch. If you wish to do interactive development with python, please start an interactive session first.

Serial batch job submission

Before submitting the job we will write the following simple python script for use as an example:

# fib-example.py
parents, babies = (1, 1)
while babies < 100:
    print('This generation has %d babies' % babies)
    parents, babies = (babies, parents + babies)

Now create a batch submission script that we will submit to the batch system:

#!/bin/bash --login
#SBATCH -p serial         # (--partition=serial) Option - serial is the default

# Load the version you require
module load anaconda3/5.3.0

# Execute our simple python script we created above:
python fib-example.py

Submit the jobscript using:

sbatch scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

A simple parallel numpy example script is:

# eig-example.py
import numpy

def test_eigenvalue():
  i=500
  data=numpy.random.rand(i,i)
  result=numpy.linalg.eig(data)
  return result

print(test_eigenvalue())

and the corresponding jobscript is:

#!/bin/bash --login
#SBATCH -p multicore    # (--partition=multicore)
#SBATCH -n 8            # (--ntasks=8) Number of cores (2-40)

# Load the version you require
module load anaconda3/5.3.0

# Inform numpy how many cores to use. $SLURM_NTASKS is automatically set to the number given above.
export OMP_NUM_THREADS=$SLURM_NTASKS

python eig-example.py

Adding packages

You will need to load the following modulefile before proceeding to allow access to the outside world, in addition to your prefer anaconda version’s modulefile:

module load proxy

Anaconda packages

To see what is already installed in the central conda installation:

conda list

To see if a package is available

conda list package

where package is replaced with the name of the package you want.

The next section provides details on conda environments which allow you to install your own packages to extend the capabilities of the central install.

Conda environments

A conda environment provides a localised installation of packages managed by Anaconda python. Multiple environments can be set up for different groups of packages (e.g., your machine learning packages and your chemistry packages). The packages are installed in a directory in your home area. The following commands will install and activate / deactivate a particular environment.

Run these commands on the login node or in an interactive session to set up a new conda environment:

module load anaconda3/5.3.0
module load proxy

Now create the conda environment. You will then be able to add conda (python) packages to it. Packages will be stored in your home dir in ~/.conda/envs/. There are a couple of ways you can do this (we use the name test_env in the examples below). When running the commands below, ignore any message about updating conda – you won’t be able to update the centrally installed version:

There are two alternatives. If you want to fix exactly which version of python you are working with, use option 2:

  1. Create an empty environment. You can add packages to it later (use -n on 2019.07 version):
    conda create -n test_env
    
  2. Alternatively, create a new environment that contains various standard packages such as pip, setuptools etc (it will tell you what is being installed). You can specify a particular version of python (to match the centrally installed version) or conda will download the latest available and install that in your environment:
    # If you want to ensure you use the same version of python as in the central install:
    python --version
        #
        # Make a note of: 3.7.0 (or whatever your version is)
    
    # Create the conda environment (remove the =3.7.0 to use the latest version of python)
    conda create -n test_env python=3.7.0
    [y]
    

In both of the above examples, you can add the name of a python package to be installed while creating your environment. For example:

conda create -n test_env packagename

The following commands can be used in a jobscript or interactive session every time you want to use your conda environment:

# Activate virtual env - these commands can be run inside a jobscript or an interactive session

[username@login02 [CSF4] ~]$ source activate test_env
                                #
                                # Can also use 'conda activate test_env' but only on the
                                # login node. In jobscripts you should use 'source'.

(test_env) [username@login02 [CSF4] ~]$
   #
   # Notice that your prompt changes to indicate the
   # name of the active conda environment.

(test_env) [username@login02 [CSF4] ~]$ python myscript.py
                                                    #
                                                    # import xyz
                                                    # ...
                                                    # exit()

# If you wanted to install other packages in the current active env:
(test_env) [username@login02 [CSF4] ~]$ conda install packagename

# If you wanted to remove a package from the current active env:
(test_env) [username@login02 [CSF4] ~]$ conda remove packagename

# When you are finished, switch off the conda environment.
# Your python code will only have access to the packages provided by the central install.
(test_env) [username@login02 [CSF4] ~]$  source deactivate

[username@login02 [CSF4] ~]$ 
   #
   # Your prompt is now how it normally is, indicating
   # that no conda environment is active.

Some other actions you may wish to perform with your environments:

# Get a list of all your conda environments (active env shown with a *)
conda env list
   # conda environments:
   #
   gputest                  <HOME>/.conda/envs/gputest
   my_bioinf_env            <HOME>/.conda/envs/my_bioinf_env
   base                  *  /opt/software/RI/apps/Anaconda3/5.2.0
     #
     # Note that the 'base' env is the central install area and is shown as the
     # default environment (with a *) when you have not activated any of your
     # own environments. You will NOT be able to install packages into 'base'
     # because of the file permissions. You must create an activate one of your
     # own environments before installing any extra python packages.


# Get a list of the packages installed in the currently active environment
conda list

# Install other packages in a named virtual env (for example we install a packed named pillow).
# If you don't specify the name it will install in the currently active package.
conda install -n test_env pillow

# To remove the test_env virtual env files (deletes files from your ~/.conda/envs/ directory)
conda remove -n test_env --all

It is possible to activate other channels inside a conda environment so that you can install packages from different channels. For example, to use BioConda in a conda env:

# This example uses the BioConda channel and installs a package named emirge from that channel.
# Note that emirge requires python 2.7 so we use the older Anaconda v2 installation:
module load anaconda2/5.3.0         # Provides Python 2.7.15
module load proxy

# Create a new conda env
conda create -n my_bioinf_env python=2.7.15
   #
   # (ignore warning about new version of conda)
   # 
   # 
   # Proceed ([y]/n)? y

# Activate the environment
source activate my_bioinf_env             # Can also use 'conda activate my_bioinf_env'

# Add the BioConda channels to the env - these are the places where packages
# are downloaded from. The BioConda website says the order of these is important.
conda config --env --add channels defaults
conda config --env --add channels bioconda
conda config --env --add channels conda-forge

# Install the emirge package (use your own packages here)
conda install emirge

# We use emirge at the command-line but you may need to run python
# and import a library, for example.
emirge.py --help

# Deactivate the env when done
source deactivate

pip installation inside a conda env

The above section showed how to create a conda env and then use the conda install command to install packages in to that environment.

You may also want to perform pip installations in to your conda env. This is possible but you need to perform a couple of extra steps:

# First move a config file you may have out of the way. If this is present it
# will force the pip install to occur outside of the conda env.
mv ~/.pydistutils.cfg ~/.pydistutils.cfg.ignore

# Now load which ever version of python you need. For example:
module load apps/anaconda3/5.3.0

# Load the proxy
module load proxy

# Check which version of python we have. We'll force the conda env to use that version.
python --version
    # Make a note of: 3.7.0

# Create the conda env containing some basic python packages, including 'pip'.
# It is important to use the 'pip' package installed inside you conda env so
# that it knows to install pypi packages inside the env.
conda create -n myenv python=3.7.0
[y]

# Activate the env
conda activate myenv

# Now install your pip package(s)
pip install --log pip.log packagename
  #
  # Note: If pip reports errors about the proxy, you are using a very new
  # version of pip. Try using the other proxy modulefile that adds
  # 'http://' to the proxy settings:
  # module swap proxy proxy2

# You should now be able to use your pypi package:
python
import packagename

pip/pypi installation (outside of a conda env)

An alternative to using conda environments is to install a package in to your home area using pip. This will make the package available every time you run python, but makes it difficult to separate packages and dependencies for different tasks. The above conda environments method is recommended.

You will need to load the following modulefile before proceeding to allow access to the outside world:

module load proxy

To install a package in to your home directory storage area:

pip install --user package

where package is replaced with the name of the package you want. This will install the package to a hidden directory called .local in your home directory. It should be automatically picked up by python, you can test thus:

python
import package
help (package)

Hints and Tips

Got a handy tip? Please send it in to its-ri-team@manchester.ac.uk

Plotting graphs with Pyplot

If you want to plot a graph to a PNG file, say, in batch, try the following (see this stackoverflow question and answer):

# Create a file named graph.py:

import matplotlib as mpl
# Agg backend will render without X server on a compute node in batch
mpl.use('Agg')
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(range(10))
# Save the graph to a .png file (you cannot use plt.show() in a batch job)
fig.savefig('temp.png')

Then, to quickly test from the CSF login node, load the modulefile and submit a batch job (without writing a jobscript):

module load apps/anaconda3/5.2.0
qsub -b y -cwd -V -j y -l short python ./graph.py

When job completes, view the image file on the login node using the Linux eog tool:

eog temp.png

Further info

Updates

None.

Last modified on June 30, 2023 at 11:58 am by George Leaver