Overview

The MSU HPCC, managed by ICER, provides an optimal platform for conducting bioinformatics workflows. This guide explains how to run the nf-core/cutandrun pipeline for CUT&RUN and CUT&Tag analysis, ensuring reproducibility and efficiency.

Key Benefits of nf-core/cutandrun

nf-core/cutandrun offers:

Prerequisites

Step-by-Step Tutorial

Note on Directory Variables

On the MSU HPCC:

Note on Working Directory

The working directory, where intermediate and temporary files are stored, can be specified using the -w flag when running the pipeline. This helps keep outputs and temporary data organized.

1. Load Nextflow Module

Ensure Nextflow is loaded:

module load Nextflow

2. Create an Analysis Directory

Set up a directory for your analysis (referred to as the Analysis Directory):

mkdir $HOME/cutandrun_project
cd $HOME/cutandrun_project

3. Prepare Sample Sheet

Create a sample sheet (samplesheet.csv) with the following format:

sample,fastq_1,fastq_2,replicate,antibody
sample1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz,1,H3K27me3
sample2,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz,1,IgG

Ensure all paths to the FASTQ files are accurate.

4. Configure ICER Environment

Create an icer.config file for SLURM:

process {
    executor = 'slurm'
}

5. Run nf-core/cutandrun

Example SLURM Job Submission Script

Below is a shell script for submitting an nf-core/cutandrun job to SLURM:

#!/bin/bash

#SBATCH --job-name=cutandrun_job
#SBATCH --time=48:00:00
#SBATCH --mem=48GB
#SBATCH --cpus-per-task=12

cd $HOME/cutandrun_project
module load Nextflow/23.10.0

nextflow pull nf-core/cutandrun
nextflow run nf-core/cutandrun -r 3.14.0 --input ./samplesheet.csv -profile singularity --outdir ./cutandrun_results --genome GRCh38 -work-dir $SCRATCH/cutandrun_work -c ./nextflow.config

Note on Reference Genomes

Common reference genomes are located in the research common-data space on the HPCC. Refer to the README file for details. For more guidance on downloading reference genomes from Ensembl, see this GitHub repository.

Execute the pipeline with the following command, including the -w flag for a separate working directory:

nextflow run nf-core/cutandrun -profile singularity --input samplesheet.csv --genome GRCh38 -c icer.config -w $SCRATCH/cutandrun_project

6. Monitor and Manage the Run

Best Practices

Getting Help

If you encounter issues running nf-core/cutandrun on the HPCC, consider these resources:

Conclusion

Running nf-core/cutandrun on the MSU HPCC is simplified using Singularity and Nextflow. This guide ensures reproducible and efficient CUT&RUN and CUT&Tag analysis, leveraging the HPCC’s computational capabilities for bioinformatics research.

November 04, 2024   John Vusich