Anaconda on Rivanna

Overview

Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture.

Rivanna has Python 2 and 3 available as part of the Anaconda distribution. Anaconda comes installed with many packages best suited for scientific computing, data processing, and data analysis, while making deployment very simple. Its package manager conda installs and updates python packages and dependencies, keeping different package versions isolated on a project-by-project basis. Anaconda is available as open source under the New BSD license. It also ships with pip, the common python package manager.

Available Versions

The current installation of Anaconda incorporates the most popular packages. To find the available versions and learn how to load them, run:

module spider anaconda

The output of the command shows the available Anaconda module versions.

For detailed information about a particular Anaconda module, including how to load the module, run the module spider command with the module’s full version label. For example:

module spider anaconda/2019.10-py2.7
ModuleVersion Module Load Command
anaconda2019.10-py2.7 module load anaconda/2019.10-py2.7
anaconda2020.11-py3.8 module load anaconda/2020.11-py3.8

Installing packages

Packages could be installed via the pip or conda package managers

Using pip

Open the bash terminal, and type:

  1. module load anaconda
  2. pip search package_name (search for a package by name)
  3. pip install --user package_name (install a package)
  4. pip update package_name --upgrade (upgrade the package to latest stable version)
  5. pip list (list all installed packages)

Do not upgrade pip. If you see the following message asking you to upgrade your pip version, it is usually safe to ignore it.

You are using pip version x.x.x, however version y.y.y is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

Doing so may result in broken dependencies.

(As of 01/10/2020, this error message is suppressed.)

However, if you must upgrade pip, please do so in a virtual environment, such as conda.

Using conda

You can specify which version of Python you want to run using conda. This can be done on a project-by-project basis, and is part of what is called a “Virtual Environment”. A Virtual Environment is simply your isolated copy of Python in which you maintain your own version of files and directories. It enables you to keep other projects unaffected. With projects that have similar dependencies, you can freely install different versions of the same package without worry on two different Virtual Environments. In order to jump between two VE’s, you simply activate or deactivate your environment. Follow the steps below:

  1. Set up your Virtual Environment:

    conda create -n your_env_name_goes_here (default Python version: use conda info to find out)

    OR

    conda create -n your_env_name_goes_here python=version_goes_here (This command will automatically upgrade pip to the latest version in the environment. To find specific Python versions, use conda search "^python$".)

  2. If it asks you for y/n, hit y to proceed. It will start the installation

  3. Activate your newly created environment source activate your_env_name_goes_here

  4. Install a package in your activated environment

    conda install -n your_env_name_goes_here your_package_name_goes_here

    OR

    conda install -n your_env_name_goes_here \ your_package_name_goes_here=version_goes_here

    OR (even better)

    In your home directory or Conda installation folder, create a file called .condarc (if not already there) Inside the file write the following:

    create_default_packages
        - your_package_name_goes_here
        - your_package_name_goes_here
        - your_package_name_goes_here
        ...
    

    `` Now everytime you create a new environment, all those packages listed in .condarc will be installed.

  5. To end the current environment session: source deactivate

  6. Remove an environment: conda remove -n your_env_name_goes_here -all

To see all available environments, run conda env list.

Python and MPI

Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture. On Rivanna, we provide mpi4py libraries via dedicated modules that are built using the GCC compiler and OpenMPI libraries.

ModuleVersion Module Load Command
mpi4py3.0.0-py2.7 module load gcc/7.1.0 openmpi/3.1.4 mpi4py/3.0.0-py2.7
mpi4py3.0.3 module load gcc/9.2.0 openmpi/3.1.6 mpi4py/3.0.3

As long as an MPI toolchain (e.g. gcc + openmpi) is loaded, you can install mpi4py using any Python/Ancaonda module via pip install --user mpi4py.

Example SLURM script

Non-MPI

#!/bin/bash
#SBATCH -A mygroup
#SBATCH -p standard
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -t 01:00:00
#SBATCH -o myprog.out

module purge
module load anaconda # or anaconda/2019.10-py2.7 for Python 2
# optional: uncomment next line to use your custom Conda environment; replace 'custom_env' with actual env name
# source activate custom_env

python myscript.py

MPI

#!/bin/bash
#SBATCH -A mygroup
#SBATCH -p standard
#SBATCH -N 1
#SBATCH --ntasks-per-node=10
#SBATCH -t 01:00:00
#SBATCH -o myprog.out

module purge
module load gcc openmpi
module load mpi4py

# If you installed mpi4py manually, comment out the previous line and uncomment the next two lines.
# Replace 'custom_env' with the actual env name.
#module load anaconda
#source activate custom_env

srun python myscript.py

More Information

Please visit the official [Anaconda website] (https://www.anaconda.com/distribution/).