Overview

The MSU HPCC, managed by ICER, is a powerful environment for bioinformatics workflows. This guide explains how to run the nf-core/rnaseq pipeline for RNA-seq pre-processing and the nf-core/differential-abundance pipeline for differential expression analysis and GSEA. Users can click here to skip directly to the differential abundance and GSEA steps if they already have a counts table.

Key Benefits

Prerequisites

Step-by-Step Guide

Note on Directory Variables

Note on Working Directory

Intermediate files are stored in a working directory specified by the -w flag. Keeping intermediate data separate from the final output helps organize results and save space.

1. Pre-processing with nf-core/rnaseq

1.1 Load Nextflow Module

module load Nextflow

1.2 Create an Analysis Directory

mkdir $HOME/rnaseq_project
cd $HOME/rnaseq_project

1.3 Prepare Sample Sheet for Pre-processing

Create a samplesheet.csv for nf-core/rnaseq pre-processing:

sample,fastq_1,fastq_2,strandness
CONTROL_REP1,/path/to/CONTROL_REP1_R1.fastq.gz,/path/to/CONTROL_REP1_R2.fastq.gz,auto
CONTROL_REP2,/path/to/CONTROL_REP2_R1.fastq.gz,/path/to/CONTROL_REP2_R2.fastq.gz,auto
TREATMENT_REP1,/path/to/TREATMENT_REP1_R1.fastq.gz,/path/to/TREATMENT_REP1_R2.fastq.gz,auto
TREATMENT_REP2,/path/to/TREATMENT_REP2_R1.fastq.gz,/path/to/TREATMENT_REP2_R2.fastq.gz,auto

Ensure paths to FASTQ files are correct and strandness is specified appropriately.

1.4 Configure ICER Environment

Create icer.config:

process {
    executor = 'slurm'
}

1.5 Run nf-core/rnaseq

Example SLURM Submission Script
#!/bin/bash

#SBATCH --job-name=rnaseq_job
#SBATCH --time=48:00:00
#SBATCH --mem=64GB
#SBATCH --cpus-per-task=16

cd $HOME/rnaseq_project
module load Nextflow/23.10.0

nextflow pull nf-core/rnaseq
nextflow run nf-core/rnaseq -r 3.14.0 --input ./samplesheet.csv -profile singularity --outdir ./rnaseq_results --genome GRCh38 -work-dir $SCRATCH/rnaseq_work -c ./icer.config

2. Differential Expression and GSEA

If you already have a counts table, you can begin from here.

2.1 Create a Differential Expression Project Directory

mkdir $HOME/differential_abundance_project
cd $HOME/differential_abundance_project

2.2 Prepare Input Data for Differential Abundance

Create a samplesheet.csv for nf-core/differential-abundance:

sample,condition,replicate,batch
CONTROL_REP1,control,1,A
CONTROL_REP2,control,2,B
TREATMENT_REP1,treated,1,A
TREATMENT_REP2,treated,2,B

Ensure the sample column matches the IDs in the counts table.

Additional input files:

An example contrasts file:

id,variable,reference,target,blocking
condition_control_treated,condition,control,treated,
condition_control_treated_blockrep,condition,control,treated,replicate;batch

2.3 Run nf-core/differential-abundance

Example SLURM Submission Script
#!/bin/bash

#SBATCH --job-name=diff_abundance_job
#SBATCH --time=24:00:00
#SBATCH --mem=32GB
#SBATCH --cpus-per-task=8

cd $HOME/differential_abundance_project
module load Nextflow/23.10.0

nextflow pull nf-core/differential-abundance
nextflow run nf-core/differential-abundance -r 1.1.0 --input samplesheet.csv --matrix ./salmon.merged.gene_counts.tsv --transcript_length_matrix salmon.merged.gene_lengths.tsv -profile singularity --outdir ./diff_abundance_results -c ./icer.config

Best Practices

Getting Help

For assistance:

Conclusion

Running nf-core/rnaseq and nf-core/differential-abundance on the MSU HPCC provides a streamlined and efficient path from raw data to differential expression and GSEA. This integrated guide helps maximize the HPCC’s capabilities for comprehensive bulk RNA-seq analysis.

November 04, 2024   John Vusich