Adventures in Bacterial Genome Assembly with Snakemake
🧬 Adventures in Bacterial Genome Assembly: My Illumina Snakemake Pipelines on GitLab
Inspired by Dr. Ryan Wick’s and and Dr. Torsten Seeman’s pioneering work in bacterial genome assembly, I developed a series of Snakemake workflows to streamline and scale genome assembly tasks. These pipelines are hosted on GitLab and designed for reproducibility, modularity, and ease of use.
🚀 Why Snakemake?
Snakemake enables reproducible and scalable workflows. It supports:
- Complex rule chaining for Illumina, Nanopore or hybrid data
- Integrated QC, taxonomic read and/or assembly classification, and polishing
- Version control via GitLab for transparency and collaboration
- Better resource management (jobs, CPU, monitoring)
🧪 The Pipelines
1. Illumina Bacterial Assembly + QC
This pipeline assembles genomes from Illumina paired-end reads and includes QC tools like Quast, CheckM2, BUSCO, and taxonomic classification via Kraken2 and GTDB-Tk. Includes comprehensive HTML and PDF reports.
2. Assembly QC Pipeline
Same QC tools as in the previous pipeline, but without assembly (useful for doing QC on downloaded assemblies). Generates summary plots (e.g., BUSCO, N50 beeswarm) and Excel reports for easy interpretation.
3. Nanopore-only Pipeline
On my to-do list
- Updating my Nanopore-only snakemake assembly and QC pipeline to use Autocycler instead of Flye assembly
- Adding HTML and PDF reporting to the Assembly QC and Nanopore-only snakemake pipelines.
- Replace skANI identification with Brooklin API for batch identification of assemblies against Genomerxiv database.
🌱 Final Thoughts
These pipelines reflect a journey of technical growth and community engagement. Making a good assembly is an art, and evolves continuously with new insights and tools.
Explore the repositories, fork them, and let’s collaborate to advance microbial genomics—one Snakemake rule at a time.
Enjoy Reading This Article?
Here are some more articles you might like to read next: