Overview

The MSU HPCC, managed by ICER, is equipped with tools to facilitate seamless data transfer for bioinformatics projects. The nf-core/fetchngs pipeline is particularly useful for retrieving raw sequencing files and metadata from public databases directly to the HPCC. This guide provides a step-by-step approach to using nf-core/fetchngs effectively.

Key Benefits of nf-core/fetchngs

nf-core/fetchngs is designed to:

Prerequisites

Step-by-Step Tutorial

1. Load Nextflow Module

Ensure Nextflow is available in your environment:

module load Nextflow

2. Create a Directory

Set up a dedicated directory for your data transfer tasks:

mkdir ~/fetchngs_project
cd ~/fetchngs_project

3. Configure ICER Environment

Create an icer.config file for using SLURM with Nextflow:

process {
    executor = 'slurm'
}

4. Run nf-core/fetchngs

Download sequencing files and metadata:

nextflow run nf-core/fetchngs -profile singularity --input accession_list.csv -c icer.config

Example Accession List File

Create an accession_list.csv file with the following format:

SRR1234567
SRR1234568
SRR1234569

Each line should contain a unique accession ID for the data you wish to download.

5. Review and Manage Downloads

Ensure that the transferred files are stored and organized correctly:

Best Practices


Conclusion

Using nf-core/fetchngs on the MSU HPCC simplifies the process of transferring raw sequencing data and metadata, streamlining data acquisition for bioinformatics projects. The combination of Singularity and Nextflow ensures a reproducible and efficient workflow tailored for high-performance computing environments.


Getting Help

November 03, 2024   John Vusich, Leah Terrian