Anaconda Python
To download packages from external sites (e.g., when creating a conda env), please do so from a batch job or use an interactive session on a backend node by running
qrsh -l short
. You DO NOT then need to load the proxy modulefiles. Please see the Adding Packages section below for more information.Overview
Anaconda is a completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing general-purpose statistical software package. It contains 100+ of the most popular Python packages for science, math, engineering, data analysis.
Versions available are listed below in ‘Set up Procedure’.
Restrictions on use
There are no restrictions on access Anaconda Python on the CSF. All users should read the End User License Agreement before using the software.
Set up procedure
We now recommend loading modulefiles within your jobscript so that you have a full record of how the job was run. See the example jobscript below for how to do this. Alternatively, you may load modulefiles on the login node and let the job inherit these settings.
Load one of the following modulefiles (most recent listed at the top, oldest at the bottom):
module load apps/binapps/anaconda3/2024.10 # Python 3.12.7 module load apps/binapps/anaconda3/2023.09 # Python 3.11.5 module load apps/binapps/anaconda3/2023.03 # Python 3.10.10 module load apps/binapps/anaconda3/2022.10 # Python 3.9.13 module load apps/binapps/anaconda3/2021.11 # Python 3.9.7 module load apps/python/miniconda3/4.10.3 # Python 3.9.5 module load apps/binapps/anaconda3/2020.07 # Python 3.8.3 module load apps/binapps/anaconda3/2019.07 # Python 3.7.3 module load apps/binapps/anaconda3/2019.03 # Python 3.7.3 module load apps/anaconda3/5.2.0 # Python 3.6.5 module load apps/anaconda/2.5.0 # Python 2.7.15
Additional Centrally Installed Packages
The following packages have been centrally installed so are available by default.
If a package listed under a previous version is not listed in a newer version it is because it would not install or it was not compatible with or it was not available for the later version of Anaconda python at the time of central install. If you require a package, please try installing it yourself in a conda environment.
To check whether a package is available, try the conda list
command. For example:
module load apps/binapps/anaconda3/2023.09 conda list pandas # packages in environment at /opt/apps/apps/binapps/anaconda3/2023.09: # # Name Version Build Channel pandas 2.0.3 py311ha02d727_0 # # pandas is included in Anaconda python, so you can use the following in your code: # import pandas;
The /opt/apps/apps/binapps/anaconda3/2023.09
environment is the central install of Anaconda. If you’ve activated your own conda environment (see below) then it will list what’s available in your env.
apps/binapps/anaconda3/2022.10 packages
None.
apps/binapps/anaconda3/2021.11 packages
- PyMC3
- r-irkernel
- r-essentials
- textblob
- docopt
- biopython
apps/binapps/anaconda3/2020.07 packages
- PyMC3
- r-irkernel
- r-essentials
- textblob
- docopt
- biopython
- matam
apps/binapps/anaconda3/2019.03 packages
- PyMC3
- r-irkernel
- r-essentials
- rpy2
- textblob
- docopt
- biopython
apps/anaconda3/5.2.0 packages
- PyMC3
- pyPcazip (version 2.0.8)
- r-irkernel
- r-essentials
- rpy2
- textblob
apps/anaconda/2.5.0 packages
- pyPcazip (version 1.5.1)
Running the application
Please do not run python on the login node. Jobs should be submitted to the compute nodes via batch. If you wish to do interactive development with python, please start an interactive session first.
Serial batch job submission
Before submitting the job we will write the following simple python script for use as an example:
# fib-example.py parents, babies = (1, 1) while babies < 100: print 'This generation has %d babies' % babies parents, babies = (babies, parents + babies)
Now create a batch submission script that we will submit to the batch system:
#!/bin/bash --login #$ -cwd # Job will run from the current directory # NO -V line - we load modulefiles in the jobscript # Load the version you require module load apps/binapps/anaconda3/2022.10 # Execute our simple python script we created above: python fib-example.py
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Parallel batch job submission
A simple parallel numpy
example script is:
# eig-example.py import numpy def test_eigenvalue(): i=500 data=numpy.random.rand(i,i) result=numpy.linalg.eig(data) return result print(test_eigenvalue())
and the corresponding jobscript is:
#!/bin/bash --login #$ -cwd # Job will run from the current directory # NO -V line - we load modulefiles in the jobscript #$ -pe smp.pe 8 # Number of cores (2-32) # Load the version you require module load apps/binapps/anaconda3/2022.10 # Inform numpy how many cores to use. $NSLOTS is automatically set to the number given above. export OMP_NUM_THREADS=$NSLOTS python eig-example.py
Using a Conda Env in a batch job
If you have created your own conda environment, usually in which to install packages (see below for details on how to do this), you must activate the conda environment in your jobscript so that it can find your python packages. Create a jobscript similar to the following (note that this is a serial, one core, jobscript but you could use a parallel jobscript if your software supports the use of multiple cores):
#!/bin/bash --login #$ -cwd # Job will run from the current directory # NO -V line - we load modulefiles in the jobscript # Load the version you require module load apps/binapps/anaconda3/2022.10 # Activate the environment. Note: You must use the 'source' keyword, not 'conda'. source activate my_env # Python now has access to any packages installed in your conda env python my_app.py # You can deactivate the environment at the end of the batch job, although this # step is optional if you are not running any more commands in your jobscript. source deactivate
Submit the jobscript using:
qsub scriptname
where scriptname is the name of your jobscript.
Adding packages
June 2023: The proxy is no longer available. To download packages from external sites (e.g., when creating a conda env), please do so from a batch job or use an interactive session on a backend node by running qrsh -l short . You DO NOT then need to load the proxy modulefiles. This is covered below. |
module load tools/env/proxy2 # Older versions of anaconda / pip may complain about the proxy. In that case, use: module load tools/env/proxy # If proxy2 is NOT loaded # OR module swap tools/env/proxy2 tools/env/proxy # If you loaded proxy2 and want to swap # to 'proxy'
Anaconda packages
To see what is already installed (in the central anaconda installation):
conda list
To see if a package is available
conda list package
where package
is replaced with the name of the package you want. If the package you require is listed as available for install, read on for how you can install packages in your own home directory storage using conda environments or pip
.
Conda environments
A conda environment provides a localised installation of packages managed by Anaconda python. Multiple environments can be set up for different groups of packages or projects (e.g., a machine learning project and a different chemsitry project). Use a new conda environment for each project so that you don’t break previous projects when you install other packages. The packages are installed in a directory in your home area.
The following commands will install and activate / deactivate a particular environment.
Before using conda environments, load the required anaconda modulefile and proxy. Run these commands on the login node or in an interactive session to set up a new conda environment:
# We must now use a compute node to allow downloads from the outside world qrsh -l short # # Wait to be logged in to a compute node, then: module load apps/binapps/anaconda3/2022.10 # Load the version you requiremodule load tools/env/proxy2# May need to use 'tools/env/proxy' for older # versions of anaconda3 # You should stay on the compute node while you create and add packages to # your conda env. Once you are happy with it, you can 'exit' the compute node # (run 'exit') to return to the login node. Jobscripts can use your new # conda environment (see below.) If you need to install any more packages in # the env, start with the 'qrsh' command as above.
Create a new new conda environment
Now create the conda environment. You will then be able to add conda (python) packages to it.
For reference, packages will be stored in your home dir in a directory specific to each conda environment you create: ~/.conda/envs/envname
. But you shouldn’t need to use this path directly – the conda commands will ensure packages are installed and used from this area.
There are a couple of ways you can create a conda environment which we’ll now go through (we use the name test_env
in the examples below).
- Method 1: create a new environment that contains various standard packages such as
pip
,setuptools
nadpython
etc (it will tell you what is being installed). These help you install other packages (e.g., using thepip
, which you might see in the installation instructions of a github project.) You can specify a particular version of python (to match the centrally installed version) or conda will download the latest available and install that in your environment:# If you want to ensure you use the same version of python as in the central install: python --version # # Make a note of: 3.9.13 (or whatever your version is) # Create the conda environment (remove the ==3.9.13 to use the latest version of python) conda create -n test_env python==3.9.13 [y]
You can ignore any messages about new versions of conda and upgrading them.
- Method 2: Alternatively, if you just need an empty environment you can create an empty environment then add packages to it later:
conda create -n test_env
You can ignore any messages about new versions of conda and upgrading them.
In both of the above examples, you can add the name of a python package to be installed while creating your environment. For example:
conda create -n test_env packagename
Activate the new new conda environment
You must activate the conda environment after creating it, so that further package installs go inside your new environment.
The following commands should be used in a jobscript or interactive session every time you want to use your conda environment.
Note: that we DO NOT USE conda activate envname
. Instead WE USE source activate envname
. See below for info on conda activate vs source activate.
# Activate virtual env - these commands can be run inside a jobscript or an interactive session source activate test_env # We do not recommend 'conda activate ...' # If you are on the login node you'll see your prompt change to indicate you # are working in an activated conda environement: (test_env) [username@login2 [csf3] ~]$ # # The name of your active conda env is displayed at the prompt
Adding and removing a package to/from your conda environment
Using the previously activated environment (you must activate the env!):
# (Optional) You can test whether a package is available by doing: python -c "import packagename" # # If you see no output then packagename is already installed. # If you see an error, you need to install packagename in your env. # If you wanted to install packages in the current active env: conda install packagename # Can now run your python script using the packages installed in your environment: python myscript.py # If you wanted to remove a package from the current active env: conda remove packagename # When you are finished, switch off the conda environment. # Your python code will only have access to the packages provided by the central install. source deactivate # The login node prompt will now return to normal (no env name displayed) [username@login2 [csf3] ~]$
You’ve now successfully created a conda environment and installed some packages in to it.
conda install ...
commands run when creating an environment and populating it with various packages. While it is possible to export to YAML file the contents of a conda environment, it is sometimes easier to simply re-run all of the conda install ...
commands should you ever need to recreate a conda env elsewhere.Some other actions you may wish to perform with your environments:
List your conda environments
# Get a list of all your conda environments (active env shown with a *) conda env list # conda environments: # gputest <HOME>/.conda/envs/gputest my_bioinf_env <HOME>/.conda/envs/my_bioinf_env base * /opt/apps/apps/binapps/anaconda3/2022.10
List the packages installed in the active conda env
# Get a list of the packages installed in the currently active environment conda list
Install packages in another named conda env
# Install other packages in a named virtual env (for example we install a packed named pillow). # If you don't specify the name it will install in the currently active package. conda install -n test_env pillow
Remove (delete) a conda env
# To remove the test_env virtual env files (deletes files from your ~/.conda/envs/ directory) conda remove -n test_env --all
Activate other Channels in a conda env
It is possible to activate other channels inside a conda environment so that you can install packages from different channels. For example, to use BioConda in a conda env:
# This example uses the BioConda channel and installs a package named emirge from that channel. # Note that emirge requires python 2.7 so we use the older Anaconda v2 installation: qrsh -l short module load apps/anaconda/2.5.0 # Provides Python 2.7.15module load tools/env/proxy# Create a new conda env conda create -n my_bioinf_env python==2.7.15 # # (ignore warning about new version of conda) # # # Proceed ([y]/n)? y # Activate the environment source activate my_bioinf_env # Can also use 'conda activate my_bioinf_env' # Add the BioConda channels to the env - these are the places where packages # are downloaded from. The BioConda website says the order of these is important. conda config --env --add channels defaults conda config --env --add channels bioconda conda config --env --add channels conda-forge # Install the emirge package (use your own packages here) conda install emirge # We use emirge at the command-line but you may need to run python # and import a library, for example. emirge.py --help # Deactivate the env when done source deactivate # Go back to the login node exit
pip installation inside a conda env
The above section showed how to create a conda env and then use the conda install
command to install packages in to that environment.
You may also want to perform pip
installations in to your conda env. This is possible but you need to perform a couple of extra steps:
# See earlier for why we use an interactive session now that the proxy # is no longer available. From the CSF login node: qrsh -l short # # Wait until you are logged in to a compute node, then: # First move a config file you may have out of the way. If this is present it # will force the pip install to occur outside of the conda env. # For new versions of pip you can instead add the --isolated flag to the pip # command to ask it to ignore this file. mv ~/.pydistutils.cfg ~/.pydistutils.cfg.ignore # Now load which ever version of python you need. For example: module load apps/binapps/anaconda3/2022.10 #Load the proxy (note, may need proxy instead - see below)# (can ignore this) # module load tools/env/proxy2 # Check which version of python we have. We'll force the conda env to use that version. python --version # Make a note of: 3.9.13 # Create the conda env containing some basic python packages, including 'pip'. # It is important to use the 'pip' package installed inside your conda env so # that it knows to install pypi packages inside the env. conda create -n myenv python==3.9.13 [y] # YOU MUST DO THIS BEFORE YOU CAN INSTALL IN TO THE ENV: Activate the env source activate myenv # Now install your pip package(s) - note NO --user flag here so that the # packages are installed inside the conda env's directory. pip install --isolated --log pip.log packagename # # # # New versions of pip accept an --isolated flag to make them # # ignore any ~/.pydistutils.cfg file you may have. It is # # You must either remove the ~/.pydistutils.cfg file or use # # the --isolated flag on the pip command otherwise the packages # # will NOT be installed in your conda env! # # Note: If pip reports errors about the proxy, then you are # using an older version of pip. Try using the other proxy modulefile that doesn't add the # 'http://' to the proxy settings: # module swap tools/env/proxy2 tools/env/proxy # Now repeat your 'pip install' command. # You should now be able to use your pypi package: python import packagename
pip/pypi installation (outside of a conda env)
An alternative to using conda environments is to install a package in to your home area using pip
. This will make the package available every time you run python, but makes it difficult to separate packages and dependencies for different tasks. The above conda environments method is recommended.
You will need to load the following modulefile before proceeding to allow access to the outside world:
module load tools/env/proxy2 # # Note: older version of pip may complain about # the proxy, in which case you should use # module load tools/env/proxy
To install a package in to your home directory storage area:
# We are NOT using a conda env so here you should NOT be working in an active env [username@login2 [csf3] ~]$ pip install --user package
where package
is replaced with the name of the package you want. This will install the package to a hidden directory called .local
in your home directory.
You may need to do as follows with more recent anaconda versions:
[username@login2 [csf3] ~]$ pip3 install --user --proxy=http://proxy.man.ac.uk:3128 package
It should be automatically picked up by python when that module is loaded in the future, you can test thus:
python import package help (package)
Hints and Tips
Got a handy tip? Please send it in to its-ri-team@manchester.ac.uk …
conda activate vs source activate
After creating a new conda env (see above) and trying to activate it with:
conda create -y my_new_env conda activate my_new_env
you may see a message:
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. To initialize your shell, run $ conda init Currently supported shells are: - bash ... - powershell See 'conda init --help' for more information and options. IMPORTANT: You may need to close and restart your shell after running 'conda init'.
At this point, we recommend that you DO NOT run conda init bash
. This will edit your ~/.bashrc
script, which is run every time you log in to the CSF. It will add commands to automatically activate the base
conda env upon login for the version of Anaconda python you are currently using. This can interfere with other applications, other version of Anaconda python, or other conda environments.
You can tell whether you did at some point run conda init bash
by looking at your login prompt:
(base) [username@login1 [csf3] ~]$ # # If you see "(base)" in your prompt after logging in to the CSF then # you must have run 'conda init' at some point. This fixes the version of # anaconda python in use, which will make using newer versions provided # by our modulefiles more difficult. # See below for how to edit your ~/.bashrc file to remove this.
Instead, run:
# Use this INSTEAD of 'conda init ...' source activcate my_new_env
The source
command will activate your environment without editing your ~/.bashrc
file. You should also use source activate my_new_env
in your jobscripts.
Undo changes made to your ~/.bashrc
file
If you did run conda init bash
then you can remove the following lines from your ~/.bashrc
file (which is just a text file:
gedit ~/.bashrc # # Remove everything bewteen the lines show below, including the two line! # This occurs at the bottom of your ~/.bashrc file. # >>> conda initialize >>> # !! Contents within this block are managed by 'conda init' !! ... remove all of the script code and these surrounding lines ... # <<< conda initialize <<<
Plotting graphs with Pyplot
If you want to plot a graph to a PNG file, say, in batch, try the following (see this stackoverflow question and answer):
# Create a file named graph.py: import matplotlib as mpl # Agg backend will render without X server on a compute node in batch mpl.use('Agg') import matplotlib.pyplot as plt fig = plt.figure() ax = fig.add_subplot(111) ax.plot(range(10)) # Save the graph to a .png file (you cannot use plt.show() in a batch job) fig.savefig('temp.png')
Then, to quickly test from the CSF login node, load the modulefile and submit a batch job (without writing a jobscript):
module load apps/anaconda3/5.2.0 qsub -b y -cwd -V -j y -l short python ./graph.py
When job completes, view the image file on the login node using the Linux eog
tool:
eog temp.png
Further info
Updates
None.