Two New Snakemake Pipelines for Bacteriophage Assembly and QC: Illumina and Nanopore

A New Step for Reproducible Phage Genomics

I recently released two complementary Snakemake pipelines for bacteriophage genome assembly and quality control:

Together, they provide a reproducible, modular framework for transforming raw sequencing reads into biologically interpretable phage genome quality reports.

Both workflows are released under the MIT license and built for transparent, reproducible analysis using per-rule conda environments in Snakemake.

If you use these pipelines in your research, I’d love to hear about it! Feedback and contributions are always welcome.

Why These Pipelines Matter

Bacteriophage projects often involve heterogeneous sequencing strategies. Some datasets are generated with high-accuracy short reads, others with long-read platforms, and many projects now combine both across multiple studies.

These pipelines are designed to make that reality easier to manage:

One workflow optimized for Illumina paired-end reads
One workflow optimized for Oxford Nanopore long reads
Consistent reporting principles across both
Scalable execution from laptop to HPC
Reproducibility by design through Snakemake and isolated software environments

The goal is simple: spend less time stitching tools together, and more time interpreting phage biology.

Pipeline 1: Illumina Phage Assembly and QC

Repository: phage_assembly_snakemake

This workflow starts from paired-end Illumina FASTQ files and performs:

Read QC and filtering with fastp
De novo assembly with Shovill (SPAdes backend)
Assembly metrics with QUAST
Viral contig identification with VirSorter2
Completeness and contamination assessment with CheckV
Tool version capture for provenance
Automated HTML and PDF reporting through R Markdown and WeasyPrint

Key strengths:

End-to-end assembly plus biological QC in one run
Automatic database handling for VirSorter2 and CheckV
Unified summary output for multi-sample projects
Built-in low-coverage flagging in the final report

Pipeline 2: Nanopore Phage Assembly and QC

Repository: phage-nanopore-assembly-snakemake

This workflow is optimized for Oxford Nanopore long reads and includes:

Raw read QC with NanoPlot
Read filtering with Filtlong
Adapter trimming with Porechop_ABI
Post-filter QC with NanoPlot
Long-read assembly with Flye
Assembly graph visualization with Bandage
Consensus polishing with Medaka
Viral identification with VirSorter2
Completeness and contamination profiling with CheckV
Assembly metrics with QUAST
Integrated HTML and PDF report generation

Key strengths:

Long-read native assembly strategy
Explicit assembly graph output for structural interpretation
Configurable Medaka model support for modern ONT chemistries
Robust handling of no-hit viral classification cases

One Philosophy, Two Data Types

Although each pipeline is tuned for a different sequencing technology, both follow the same design philosophy:

Modular, readable Snakemake rules
Deterministic directory structure and outputs
Automated dependency management with conda
Traceable software versions
Practical reports that summarize QC and biological relevance

This makes it easier to compare results across projects, collaborate between teams, and maintain consistent analytical standards.

Typical Usage

Illumina workflow (example):

snakemake --use-conda --cores 24 --configfile config.yaml

Nanopore workflow (example):

snakemake --use-conda -j 24 --configfile config.yaml

In both workflows, a dry run is recommended before the first execution:

snakemake -n -p --use-conda --configfile config.yaml

Final Thoughts

Reliable phage genomics depends on more than assembly alone. It requires clear quality metrics, transparent methods, and workflows that remain reproducible as projects scale.

These two pipelines were built to support that standard in day-to-day research: from raw reads to actionable, documented results.

If you work on phage genomics with Illumina or Nanopore data, I hope these workflows help you move faster with more confidence.

Repositories

Nanopore pipeline: https://gitlab.ilvo.be/stevebaeyen/phage-nanopore-assembly-snakemake
Illumina pipeline: https://gitlab.ilvo.be/stevebaeyen/phage_assembly_snakemake