Overview
Nvidia Clara Parabricks is a GPU-accelerated software suite for performing secondary analysis of next generation sequencing (NGS) DNA and RNA data. It contains GPU-enabled versions of popular bioinformatics tools such as the aligners BWA-Mem and STAR.
Loading the container
On the HPC system, Clara Parabricks is available as an Apptainer container. To load the clara-parabricks
container module, you can type:
module load apptainer clara-parabricks
The load command will load a default version of Clara Parabricks, unless another version is specified. To see the available versions, type:
module spider clara-parabricks
The Clara Parabricks container on the HPC system includes many bioinformatics tools for genomics and transcriptomics. Each tool must be accessed using the Apptainer run
command to activate the container, followed by the Clara Parabricks pbrun
command to call the designated tool, followed by arguments specific to each tool. See below for an example using the fq2bam
pipeline tool, which does a BWA-Mem
alignment, sorts reads by coordinates, marks duplicate reads with GATK MarkDuplicates
, and optionally generates a BQSR
report.
#!/bin/bash
#SBATCH -A <allocation> # allocation name
#SBATCH -p gpu # partition name
#SBATCH --gres=gpu:1 # request one gpu
#SBATCH -C "v100|a100" # constrain to a100 or v100 gpus
#SBATCH -N 1 # request 1 node
#SBATCH -c 8 # request 8 cores
#SBATCH -t 24:00:00 # set time limit of 24 hours
# prepare the environment
module purge
module load apptainer clara-parabricks
# run parabricks fq2bam pipeline
apptainer run --nv \
-B $PWD:/workdir \
-B $PWD:/outputdir \
$CONTAINERDIR/clara-parabricks-4.1.1.sif \
pbrun fq2bam \
--ref /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
--in-fq /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz \
--out-bam /outputdir/fq2bam_output.bam
Notes on fq2bam
Slurm script:
- Replace
<allocation>
with your allocation name.
- The apptainer flag
--nv
enables Nvidia GPU support inside the container.
- The apptainer flag
-B
binds a directory into the container.
- In this case, we are binding the present working directory (
$PWD
) into both /workdir
and /outputdir
inside the container.
- The variable
$CONTAINERDIR
is defined by the container module - you do not need to assign it a value. This line in the script points the apptainer run
command to the appropriate .sif
file to call the desired container.
- The
pbrun
command tells Clara Parabricks you want to run the subsequent tool (in this case, fq2bam
).
- The arguments following
pbrun fq2bam
are specific to the Clara Parabricks tool being used. See the fq2bam
reference for more detailed information on these arguments.
- In this case, the reference fasta file (
Homo_sapiens_assembly38.fasta
) and fastq data files (sample_1.fq.gz
and sample_2.fq.gz
) were downloaded ahead of time and stored in the referenced subdirectories. You should change these paths and file names as needed to point to your specific reference fasta and data files.
- This script should be saved in a file, called (for example)
job.slurm
. To run your job, you would submit the script by typing sbatch job.slurm
.
|
HPC, software, bioinformatics
lang, programming