Anaconda Python

June 2023: The proxy is no longer available.
To download packages from external sites (e.g., when creating a conda env), please do so from a batch job or use an interactive session on a backend node by running qrsh -l short. You DO NOT then need to load the proxy modulefiles. Please see the qrsh notes for more information on interactive use.

Overview

Anaconda is a completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing general-purpose statistical software package. It has 100+ of the most popular Python packages for science, math, engineering, data analysis.

Versions available are listed below in ‘Set up Procedure’.

Restrictions on use

There are no restrictions on access Anaconda Python on the CSF. All users should read the End User License Agreement before using the software.

Set up procedure

We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.

Load one of the following modulefiles (most recent listed at the top, oldest at the bottom):

module load apps/binapps/anaconda3/2023.09  # Python 3.11.5
module load apps/binapps/anaconda3/2023.03  # Python 3.10.10
module load apps/binapps/anaconda3/2022.10  # Python 3.9.13
module load apps/binapps/anaconda3/2021.11  # Python 3.9.7
module load apps/python/miniconda3/4.10.3   # Python 3.9.5
module load apps/binapps/anaconda3/2020.07  # Python 3.8.3
module load apps/binapps/anaconda3/2019.07  # Python 3.7.3
module load apps/binapps/anaconda3/2019.03  # Python 3.7.3
module load apps/anaconda3/5.2.0            # Python 3.6.5
module load apps/anaconda/2.5.0             # Python 2.7.15

Additional Centrally Installed Packages

The following packages have been centrally installed so are available by default.

If a package listed under a previous version is not listed in a newer version it is because it would not install or it was not compatible with or it was not available for the later version of Anaconda python at the time of central install. If you require a package, please try installing it yourself in a conda environment.

apps/binapps/anaconda3/2022.10 packages

None.

apps/binapps/anaconda3/2021.11 packages

  • PyMC3
  • r-irkernel
  • r-essentials
  • textblob
  • docopt
  • biopython

apps/binapps/anaconda3/2020.07 packages

  • PyMC3
  • r-irkernel
  • r-essentials
  • textblob
  • docopt
  • biopython
  • matam

apps/binapps/anaconda3/2019.03 packages

  • PyMC3
  • r-irkernel
  • r-essentials
  • rpy2
  • textblob
  • docopt
  • biopython

apps/anaconda3/5.2.0 packages

  • PyMC3
  • pyPcazip (version 2.0.8)
  • r-irkernel
  • r-essentials
  • rpy2
  • textblob

apps/anaconda/2.5.0 packages

  • pyPcazip (version 1.5.1)

Running the application

Please do not run python on the login node. Jobs should be submitted to the compute nodes via batch. If you wish to do interactive development with python, please start an interactive session first.

Serial batch job submission

Before submitting the job we will write the following simple python script for use as an example:

# fib-example.py
parents, babies = (1, 1)
while babies < 100:
    print 'This generation has %d babies' % babies
    parents, babies = (babies, parents + babies)

Now create a batch submission script that we will submit to the batch system:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# Load the version you require
module load apps/binapps/anaconda3/2022.10

# Execute our simple python script we created above:
python fib-example.py

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Parallel batch job submission

A simple parallel numpy example script is:

# eig-example.py
import numpy

def test_eigenvalue():
  i=500
  data=numpy.random.rand(i,i)
  result=numpy.linalg.eig(data)
  return result

print(test_eigenvalue())

and the corresponding jobscript is:

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript
#$ -pe smp.pe 8     # Number of cores (2-32)

# Load the version you require
module load apps/binapps/anaconda3/2022.10

# Inform numpy how many cores to use. $NSLOTS is automatically set to the number given above.
export OMP_NUM_THREADS=$NSLOTS

python eig-example.py

Using a Conda Env in a batch job

If you have created your own conda environment, usually in which to install packages (see below for details on how to do this), you must activate the conda environment in your jobscript so that it can find your python packages. Create a jobscript similar to the following (note that this is a serial, one core, jobscript but you could use a parallel jobscript if your software supports the use of multiple cores):

#!/bin/bash --login
#$ -cwd             # Job will run from the current directory
                    # NO -V line - we load modulefiles in the jobscript

# Load the version you require
module load apps/binapps/anaconda3/2022.10

# Activate the environment. Note: You must use the 'source' keyword, not 'conda'.
source activate my_env

# Python now has access to any packages installed in your conda env
python my_app.py

# You can deactivate the environment at the end of the batch job, although this
# step is optional if you are not running any more commands in your jobscript.
source deactivate

Submit the jobscript using:

qsub scriptname

where scriptname is the name of your jobscript.

Adding packages

June 2023: The proxy is no longer available.
To download packages from external sites (e.g., when creating a conda env), please do so from a batch job or use an interactive session on a backend node by running qrsh -l short. You DO NOT then need to load the proxy modulefiles. Please see the qrsh notes for more information on interactive use.

Anaconda packages

To see what is already installed (in the central anaconda installation):

conda list

To see if a package is available

conda list package

where package is replaced with the name of the package you want. If the package you require is listed as available for install, read on for how you can install packages in your own home directory storage using conda environments or pip.

Conda environments

A conda environment provides a localised installation of packages managed by Anaconda python. Multiple environments can be set up for different groups of packages or projects (e.g., a machine learning project and a different chemsitry project). Use a new conda environment for each project so that you don’t break previous projects when you install other packages. The packages are installed in a directory in your home area.

The following commands will install and activate / deactivate a particular environment.

Before using conda environments, load the required anaconda modulefile and proxy. Run these commands on the login node or in an interactive session to set up a new conda environment:

# We must now use a compute node to allow downloads from the outside world
qrsh -l short
 #
 # Wait to be logged in to a compute node, then:

module load apps/binapps/anaconda3/2022.10      # Load the version you require

module load tools/env/proxy2                    # May need to use 'tools/env/proxy' for older
                                                # versions of anaconda3

# You should stay on the compute node while you create and add packages to
# your conda env. Once you are happy with it, you can 'exit' the compute node
# (run 'exit') to return to the login node. Jobscripts can use your new
# conda environment (see below.) If you need to install any more packages in
# the env, start with the 'qrsh' command as above.

Now create the conda environment. You will then be able to add conda (python) packages to it.

For reference, packages will be stored in your home dir in a directory specific to each conda environment you create: ~/.conda/envs/envname. But you shouldn’t need to use this path directly – the conda commands will ensure packages are installed and used from this area.

There are a couple of ways you can create a conda environment which we’ll now go through (we use the name test_env in the examples below).

  1. Method 1: create a new environment that contains various standard packages such as pip, setuptools etc (it will tell you what is being installed). These help you install other packages (e.g., using the pip, which you might see in the installation instructions of a github project.) You can specify a particular version of python (to match the centrally installed version) or conda will download the latest available and install that in your environment:
    # If you want to ensure you use the same version of python as in the central install:
    python --version
        #
        # Make a note of: 3.9.13 (or whatever your version is)
    
    # Create the conda environment (remove the ==3.9.13 to use the latest version of python)
    conda create -n test_env python==3.9.13
    [y]
    

    You can ignore any messages about new versions of conda and upgrading them.

  2. Method 2: Alternatively, if you just need an empty environment you can create an empty environment then add packages to it later:
    conda create -n test_env
    

    You can ignore any messages about new versions of conda and upgrading them.

In both of the above examples, you can add the name of a python package to be installed while creating your environment. For example:

conda create -n test_env packagename

The following commands can be used in a jobscript or interactive session every time you want to use your conda environment.

Note: that we DO NOT use conda activate ..... Instead we use source activate .... See below for info on conda activate vs source activate.

# Activate virtual env - these commands can be run inside a jobscript or an interactive session
source activate test_env        # We do not recommend 'conda activate ...'

# If you are on the login node you'll see your prompt change to indicate you
# are working in an activated conda environement:
(test_env) [username@login2 [csf3] ~]$
   #
   # The name of your active conda env is displayed at the prompt

python myscript.py
         #
         # import xyz
         # ...
         # exit()

# You can test whether a package is available by doing:
python -c "import packagename"
  #
  # If you see no output then packagename is already installed.
  # If you see an error, you need to install packagename in your env.

# If you wanted to install packages in the current active env:
conda install packagename

# If you wanted to remove a package from the current active env:
conda remove packagename

# When you are finished, switch off the conda environment.
# Your python code will only have access to the packages provided by the central install.
source deactivate

# The login node prompt will now return to normal (no env name displayed)
[username@login2 [csf3] ~]$

You’ve now successfully created a conda environment and installed some packages in to it.

We highly recommend keeping a list of the conda install ... commands run when creating an environment and populating it with various packages. While it is possible to export to YAML file the contents of a conda environment, it is sometimes easier to simply re-run all of the conda install ... commands should you ever need to recreate a conda env elsewhere.

Some other actions you may wish to perform with your environments:

List your conda environments

# Get a list of all your conda environments (active env shown with a *)
conda env list
   # conda environments:
   #
   gputest                  <HOME>/.conda/envs/gputest
   my_bioinf_env            <HOME>/.conda/envs/my_bioinf_env
   base                  *  /opt/apps/apps/binapps/anaconda3/2022.10

List the packages installed in the active conda env

# Get a list of the packages installed in the currently active environment
conda list

Install packages in another named conda env

# Install other packages in a named virtual env (for example we install a packed named pillow).
# If you don't specify the name it will install in the currently active package.
conda install -n test_env pillow

Remove (delete) a conda env

# To remove the test_env virtual env files (deletes files from your ~/.conda/envs/ directory)
conda remove -n test_env --all

Activate other Channels in a conda env

It is possible to activate other channels inside a conda environment so that you can install packages from different channels. For example, to use BioConda in a conda env:

# This example uses the BioConda channel and installs a package named emirge from that channel.
# Note that emirge requires python 2.7 so we use the older Anaconda v2 installation:
qrsh -l short
module load apps/anaconda/2.5.0         # Provides Python 2.7.15
module load tools/env/proxy

# Create a new conda env
conda create -n my_bioinf_env python==2.7.15
   #
   # (ignore warning about new version of conda)
   # 
   # 
   # Proceed ([y]/n)? y

# Activate the environment
source activate my_bioinf_env             # Can also use 'conda activate my_bioinf_env'

# Add the BioConda channels to the env - these are the places where packages
# are downloaded from. The BioConda website says the order of these is important.
conda config --env --add channels defaults
conda config --env --add channels bioconda
conda config --env --add channels conda-forge

# Install the emirge package (use your own packages here)
conda install emirge

# We use emirge at the command-line but you may need to run python
# and import a library, for example.
emirge.py --help

# Deactivate the env when done
source deactivate

# Go back to the login node
exit

pip installation inside a conda env

The above section showed how to create a conda env and then use the conda install command to install packages in to that environment.

You may also want to perform pip installations in to your conda env. This is possible but you need to perform a couple of extra steps:

# See earlier for why we use an interactive session now that the proxy
# is no longer available. From the CSF login node:
qrsh -l short
  #
  # Wait until you are logged in to a compute node, then:

# First move a config file you may have out of the way. If this is present it
# will force the pip install to occur outside of the conda env.
# For new versions of pip you can instead add the --isolated flag to the pip
# command to ask it to ignore this file.
mv ~/.pydistutils.cfg ~/.pydistutils.cfg.ignore

# Now load which ever version of python you need. For example:
module load apps/binapps/anaconda3/2022.10

# Load the proxy (note, may need proxy instead - see below)
# (can ignore this)
# module load tools/env/proxy2

# Check which version of python we have. We'll force the conda env to use that version.
python --version
    # Make a note of: 3.9.13

# Create the conda env containing some basic python packages, including 'pip'.
# It is important to use the 'pip' package installed inside your conda env so
# that it knows to install pypi packages inside the env.
conda create -n myenv python==3.9.13
[y]

# YOU MUST DO THIS BEFORE YOU CAN INSTALL IN TO THE ENV: Activate the env
source activate myenv

# Now install your pip package(s) - note NO --user flag here so that the
# packages are installed inside the conda env's directory.
pip install --isolated --log pip.log packagename
 #             #
 #             # New versions of pip accept an --isolated flag to make them
 #             # ignore any ~/.pydistutils.cfg file you may have. It is
 #             # You must either remove the ~/.pydistutils.cfg file or use
 #             # the --isolated flag on the pip command otherwise the packages
 #             # will NOT be installed in your conda env!
 #
 # Note: If pip reports errors about the proxy, then you are
 # using an older version of pip. Try using the other proxy modulefile that doesn't add the
 # 'http://' to the proxy settings:
 # module swap tools/env/proxy2 tools/env/proxy
 # Now repeat your 'pip install' command.

# You should now be able to use your pypi package:
python
import packagename

pip/pypi installation (outside of a conda env)

An alternative to using conda environments is to install a package in to your home area using pip. This will make the package available every time you run python, but makes it difficult to separate packages and dependencies for different tasks. The above conda environments method is recommended.

You will need to load the following modulefile before proceeding to allow access to the outside world:

module load tools/env/proxy2
                       #
                       # Note: older version of pip may complain about
                       # the proxy, in which case you should use
                       # module load tools/env/proxy

To install a package in to your home directory storage area:

# We are NOT using a conda env so here you should NOT be working in an active env
[username@login2 [csf3] ~]$ pip install --user package

where package is replaced with the name of the package you want. This will install the package to a hidden directory called .local in your home directory.

You may need to do as follows with more recent anaconda versions:

[username@login2 [csf3] ~]$ pip3 install --user --proxy=http://proxy.man.ac.uk:3128 package

It should be automatically picked up by python when that module is loaded in the future, you can test thus:

python
import package
help (package)

Hints and Tips

Got a handy tip? Please send it in to its-ri-team@manchester.ac.uk

conda activate vs source activate

After creating a new conda env (see above) and trying to activate it with:

conda create -y my_new_env
conda activate my_new_env

you may see a message:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run
    $ conda init 

Currently supported shells are:
  - bash
...
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.

At this point, we recommend that you DO NOT run conda init bash. This will edit your ~/.bashrc script, which is run every time you log in to the CSF. It will add commands to automatically activate the base conda env upon login for the version of Anaconda python you are currently using. This can interfere with other applications, other version of Anaconda python, or other conda environments.

You can tell whether you did at some point run conda init bash by looking at your login prompt:

(base) [username@login1 [csf3] ~]$
   #
   # If you see "(base)" in your prompt after logging in to the CSF then
   # you must have run 'conda init' at some point. This fixes the version of
   # anaconda python in use, which will make using newer versions provided
   # by our modulefiles more difficult.
   # See below for how to edit your ~/.bashrc file to remove this.

Instead, run:

# Use this INSTEAD of 'conda init ...'
source activcate my_new_env

The source command will activate your environment without editing your ~/.bashrc file. You should also use source activate my_new_env in your jobscripts.

Undo changes made to your ~/.bashrc file

If you did run conda init bash then you can remove the following lines from your ~/.bashrc file (which is just a text file:

gedit ~/.bashrc
  #
  # Remove everything bewteen the lines show below, including the two line!
  # This occurs at the bottom of your ~/.bashrc file.

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!

... remove all of the script code and these surrounding lines ...

# <<< conda initialize <<<

Plotting graphs with Pyplot

If you want to plot a graph to a PNG file, say, in batch, try the following (see this stackoverflow question and answer):

# Create a file named graph.py:

import matplotlib as mpl
# Agg backend will render without X server on a compute node in batch
mpl.use('Agg')
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(range(10))
# Save the graph to a .png file (you cannot use plt.show() in a batch job)
fig.savefig('temp.png')

Then, to quickly test from the CSF login node, load the modulefile and submit a batch job (without writing a jobscript):

module load apps/anaconda3/5.2.0
qsub -b y -cwd -V -j y -l short python ./graph.py

When job completes, view the image file on the login node using the Linux eog tool:

eog temp.png

Further info

Updates

None.

Last modified on March 21, 2024 at 2:11 pm by George Leaver