Overview

The MSU HPCC (High-Performance Computing Cluster), managed by ICER (Institute for Cyber-Enabled Research), provides a robust environment for running bioinformatics workflows efficiently. This guide will cover the optimal way to use nf-core pipelines on the HPCC, ensuring a streamlined, reproducible, and user-friendly experience.

Why nf-core?

nf-core is a collection of high-quality bioinformatics pipelines built using Nextflow. These pipelines are designed to be:

Leveraging Singularity and Nextflow Modules

At MSU’s HPCC, Singularity and Nextflow are preloaded and accessible across all development and compute nodes. This built-in infrastructure is key for optimal use of nf-core pipelines without the added complexity of alternative package management systems like conda.

Benefits of Using Singularity

Nextflow Module Management

Nextflow can be managed directly using the module load nextflow command on the HPCC. This approach ensures:

Running an nf-core Pipeline on the MSU HPCC

Here’s a step-by-step guide to running an nf-core pipeline using Singularity and Nextflow:

  1. Load the Nextflow Module:

    module load Nextflow
    

    This command ensures Nextflow is available in your current environment.

  2. Set Up Your Directory: Navigate to or create a directory where you want to run the pipeline:

    mkdir ~/my_pipeline_run
    cd ~/my_pipeline_run
    
  3. Download the Pipeline: Pull the desired nf-core pipeline:

    nextflow pull nf-core/rnaseq
    
  4. Run the Pipeline with Singularity:

    nextflow run nf-core/rnaseq -profile singularity -w $SCRATCH/my_pipeline_run
    
    • The -profile singularity ensures the pipeline runs using Singularity with configurations optimized for the HPCC environment.
    • The -resume flag allows for continuation from the last checkpoint if a run is interrupted.

Example Command for a Specific Pipeline

To run the nf-core/rnaseq pipeline with a custom SLURM configuration, first create an icer.config file in your working directory:

process {
    executor = 'slurm'
}

This configuration file ensures that Nextflow submits jobs to the SLURM scheduler on the HPCC.

Then, execute the following command:

nextflow run nf-core/rnaseq -profile singularity --input samplesheet.csv --genome GRCh38

Best Practices for Running Pipelines

Conclusion

The MSU HPCC, managed by ICER, is well-equipped for running nf-core pipelines using Singularity and the Nextflow module system. This setup simplifies pipeline execution, maximizes reproducibility, and avoids unnecessary complications from external dependency managers like conda.

With Singularity and Nextflow seamlessly integrated, researchers can focus on their analyses, confident in the consistency and reliability of their computational environment.


Getting Help

November 03, 2024   John Vusich