Overview

The MSU HPCC, managed by ICER, is an excellent platform for bioinformatics workflows. This guide explains how to run the nf-core/methylseq pipeline for methylation (bisulfite-sequencing) analysis, ensuring efficiency and reproducibility.

Key Benefits of nf-core/methylseq

nf-core/methylseq provides:

Prerequisites

Step-by-Step Tutorial

Note on Directory Variables

On the MSU HPCC:

Note on Working Directory

The working directory, which stores intermediate and temporary files, can be specified using the -w flag when running the pipeline. This helps keep outputs and temporary data organized.

1. Load Nextflow Module

Ensure Nextflow is loaded:

module load Nextflow

2. Create an Analysis Directory

Set up a directory for your analysis (referred to as the Analysis Directory):

mkdir $HOME/methylseq_project
cd $HOME/methylseq_project

3. Prepare Sample Sheet

Create a sample sheet (samplesheet.csv) with the following format:

sample,fastq_1,fastq_2,replicate
sample1,/path/to/sample1_R1.fastq.gz,/path/to/sample1_R2.fastq.gz,1
sample2,/path/to/sample2_R1.fastq.gz,/path/to/sample2_R2.fastq.gz,1

Ensure all paths to the FASTQ files are accurate.

4. Configure ICER Environment

Create an icer.config file for SLURM:

process {
    executor = 'slurm'
}

5. Run nf-core/methylseq

Example SLURM Job Submission Script

Below is a shell script for submitting an nf-core/methylseq job to SLURM:

#!/bin/bash

#SBATCH --job-name=methylseq_job
#SBATCH --time=48:00:00
#SBATCH --mem=48GB
#SBATCH --cpus-per-task=12

cd $HOME/methylseq_project
module load Nextflow/23.10.0

nextflow pull nf-core/methylseq
nextflow run nf-core/methylseq -r 3.14.0 --input ./samplesheet.csv -profile singularity --outdir ./methylseq_results --fasta ./Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz --bismark_index ./bismark_index/ -work-dir $SCRATCH/methylseq_work -c ./nextflow.config

Note on Reference Genomes

Common reference genomes are located in the research common-data space on the HPCC. Refer to the README file for details. For guidance on downloading reference genomes from Ensembl, see this GitHub repository.

Execute the pipeline with the following command, including the -w flag for a separate working directory:

nextflow run nf-core/methylseq -profile singularity --input samplesheet.csv --genome GRCh38 -c icer.config -w $SCRATCH/methylseq_project

6. Monitor and Manage the Run

Best Practices

Getting Help

If you encounter issues running nf-core/methylseq on the HPCC, consider these resources:

Conclusion

Running nf-core/methylseq on the MSU HPCC is simplified using Singularity and Nextflow. This guide ensures reproducible and efficient methylation analysis, leveraging the HPCC’s computational capabilities for bioinformatics research.

November 04, 2024   John Vusich