Description

PyTorch is a deep learning framework that puts Python first. It provides Tensors and Dynamic neural networks in Python with strong GPU acceleration.

Software Category: data

For detailed information, visit the PyTorch website.

Available Versions

The current installation of PyTorch incorporates the most popular packages. To find the available versions and learn how to load them, run:

module spider pytorch

The output of the command shows the available PyTorch module versions.

For detailed information about a particular PyTorch module, including how to load the module, run the module spider command with the module’s full version label. For example:

module spider pytorch/1.10.0
ModuleVersionModule Load Command
pytorch1.10.0module load singularity/3.7.1 pytorch/1.10.0
pytorch1.12.0module load singularity/3.7.1 pytorch/1.12.0
pytorch1.8.1module load singularity/3.7.1 pytorch/1.8.1

Compatibility Issues

A100

Versions 1.6 and older are not compatible with the A100 GPU. Deprecated containers are hosted in /share/resources/containers/singularity/archive. You may continue to use them on other GPUs by excluding the A100 via the Slurm option


-x udc-an28-[1,7]

K80

Version 1.8.1 is not compatible with the K80 GPU. You may use it on other GPUs by excluding all K80s via the Slurm option


-x udc-ba25-2[3,7,8],udc-ba26-2[3-6],udc-ba27-2[3-4]

PyTorch Jupyter Notebooks

Jupyter Notebooks can be used for interactive code development and execution of Python scripts and several other codes. PyTorch Jupyter kernels are backed by containers in the corresponding modules.

Accessing the JupyterLab Portal

  1. Open a web browser and go to: https://rivanna-portal.hpc.virginia.edu.
  2. Use your “Netbadge” credentials to log in.
  3. On the top right of the menu bar of the Open OnDemand dashboard, click on Interactive Apps.
  4. In the drop-down box, click on JupyterLab.

Requesting access to a GPU node

To start a JupyterLab session, fill out the resource request webform. To request access to a GPU, verify the correct selection for the following parameters:

  1. Under Rivanna Partition, choose “GPU”.
  2. Under Optional GPU Type, choose “NVIDIA K80”, “NVIDIA P100”, “NVIDIA V100”, “NVIDIA RTX20280” or leave it as “default”. Click Launch to start the session.

Editing and Running the Notebook

Once the JupyterLab instance has started, you can edit and run your notebook as described here.

PyTorch Slurm jobs

The following is a Slurm script template. The commented numbers correspond to the items in the ensuing notes.

#!/bin/bash
#SBATCH -A mygroup
#SBATCH -p gpu          # 1
#SBATCH --gres=gpu:1    # 1
#SBATCH -c 1
#SBATCH -t 00:01:00
#SBATCH -J pytorchtest
#SBATCH -o pytorchtest-%A.out
#SBATCH -e pytorchtest-%A.err

module purge
module load singularity pytorch/1.8.1  # 2

singularity run --nv $CONTAINERDIR/pytorch-1.8.1.sif pytorch_example.py # 3

Notes:

  1. The Slurm script needs to include the #SBATCH -p gpuand #SBATCH --gres=gpu directives in order to request access to a GPU node and its GPU device. Please visit the Jobs Using a GPU section for details.

  2. To use the pytorch container, load the singularity and pytorch modules. You may choose a different version (see module spider above).

    Do not load the cuda or cudnn modules since these libraries are included with pytorch.

  3. The --nv flag sets up the container’s environment to use a GPU when running a GPU-enabled application. The run command executes the default command defined in the container, which in this case is python. What follows after the *.sif is passed as arguments. In summary, the singularity command can be translated as: “Use the python interpreter inside the pytorch container to execute pytorch_example.py with GPU enabled.”

PyTorch Interactive Jobs (ijob)

Start an ijob. Note the addition of -p gpu and --gres=gpu to request access to a GPU node and its GPU device.

ijob -A mygroup -p gpu --gres=gpu -c 1
module purge
module load singularity pytorch/1.8.1
singularity run --nv $CONTAINERDIR/pytorch-1.8.1.sif pytorch_example.py

Interaction with the Host File System

The following user directories are overlayed onto each container by default on Rivanna:

  • /home
  • /scratch
  • /nv
  • /project

Due to the overlay, these directories are by default the same inside and outside the container with the same read, write, and execute permissions. This means that file modifications in these directories (e.g. in /home) via processes running inside the container are persistent even after the container instance exits. The /nv and /project directories refer to leased storage locations that may not be available to all users.