Two New Snakemake Pipelines for Bacteriophage Assembly and QC: Illumina and Nanopore
A New Step for Reproducible Phage Genomics
I recently released two complementary Snakemake pipelines for bacteriophage genome assembly and quality control:
Together, they provide a reproducible, modular framework for transforming raw sequencing reads into biologically interpretable phage genome quality reports.
Both workflows are released under the MIT license and built for transparent, reproducible analysis using per-rule conda environments in Snakemake.
If you use these pipelines in your research, I’d love to hear about it! Feedback and contributions are always welcome.
Why These Pipelines Matter
Bacteriophage projects often involve heterogeneous sequencing strategies. Some datasets are generated with high-accuracy short reads, others with long-read platforms, and many projects now combine both across multiple studies.
These pipelines are designed to make that reality easier to manage:
- One workflow optimized for Illumina paired-end reads
- One workflow optimized for Oxford Nanopore long reads
- Consistent reporting principles across both
- Scalable execution from laptop to HPC
- Reproducibility by design through Snakemake and isolated software environments
The goal is simple: spend less time stitching tools together, and more time interpreting phage biology.
Pipeline 1: Illumina Phage Assembly and QC
Repository: phage_assembly_snakemake
This workflow starts from paired-end Illumina FASTQ files and performs:
- Read QC and filtering with
fastp - De novo assembly with
Shovill(SPAdes backend) - Assembly metrics with
QUAST - Viral contig identification with
VirSorter2 - Completeness and contamination assessment with
CheckV - Tool version capture for provenance
- Automated HTML and PDF reporting through R Markdown and WeasyPrint
Key strengths:
- End-to-end assembly plus biological QC in one run
- Automatic database handling for VirSorter2 and CheckV
- Unified summary output for multi-sample projects
- Built-in low-coverage flagging in the final report
Pipeline 2: Nanopore Phage Assembly and QC
Repository: phage-nanopore-assembly-snakemake
This workflow is optimized for Oxford Nanopore long reads and includes:
- Raw read QC with
NanoPlot - Read filtering with
Filtlong - Adapter trimming with
Porechop_ABI - Post-filter QC with
NanoPlot - Long-read assembly with
Flye - Assembly graph visualization with
Bandage - Consensus polishing with
Medaka - Viral identification with
VirSorter2 - Completeness and contamination profiling with
CheckV - Assembly metrics with
QUAST - Integrated HTML and PDF report generation
Key strengths:
- Long-read native assembly strategy
- Explicit assembly graph output for structural interpretation
- Configurable Medaka model support for modern ONT chemistries
- Robust handling of no-hit viral classification cases
One Philosophy, Two Data Types
Although each pipeline is tuned for a different sequencing technology, both follow the same design philosophy:
- Modular, readable Snakemake rules
- Deterministic directory structure and outputs
- Automated dependency management with conda
- Traceable software versions
- Practical reports that summarize QC and biological relevance
This makes it easier to compare results across projects, collaborate between teams, and maintain consistent analytical standards.
Typical Usage
Illumina workflow (example):
snakemake --use-conda --cores 24 --configfile config.yaml
Nanopore workflow (example):
snakemake --use-conda -j 24 --configfile config.yaml
In both workflows, a dry run is recommended before the first execution:
snakemake -n -p --use-conda --configfile config.yaml
Final Thoughts
Reliable phage genomics depends on more than assembly alone. It requires clear quality metrics, transparent methods, and workflows that remain reproducible as projects scale.
These two pipelines were built to support that standard in day-to-day research: from raw reads to actionable, documented results.
If you work on phage genomics with Illumina or Nanopore data, I hope these workflows help you move faster with more confidence.
Repositories
- Nanopore pipeline: https://gitlab.ilvo.be/stevebaeyen/phage-nanopore-assembly-snakemake
- Illumina pipeline: https://gitlab.ilvo.be/stevebaeyen/phage_assembly_snakemake
Enjoy Reading This Article?
Here are some more articles you might like to read next: