Adventures in Bacterial Genome Assembly with Snakemake

🧬 Adventures in Bacterial Genome Assembly: My Illumina Snakemake Pipelines on GitLab

Inspired by Dr. Ryan Wick’s and and Dr. Torsten Seeman’s pioneering work in bacterial genome assembly, I developed a series of Snakemake workflows to streamline and scale genome assembly tasks. These pipelines are hosted on GitLab and designed for reproducibility, modularity, and ease of use.

🚀 Why Snakemake?

Snakemake enables reproducible and scalable workflows. It supports:

Complex rule chaining for Illumina, Nanopore or hybrid data
Integrated QC, taxonomic read and/or assembly classification, and polishing
Version control via GitLab for transparency and collaboration
Better resource management (jobs, CPU, monitoring)

🧪 The Pipelines

1. Illumina Bacterial Assembly + QC

GitLab Repository

This pipeline assembles genomes from Illumina paired-end reads and includes QC tools like Quast, CheckM2, BUSCO, and taxonomic classification via Kraken2 and GTDB-Tk. Includes comprehensive HTML and PDF reports.

dag

2. Assembly QC Pipeline

GitLab Repository

Same QC tools as in the previous pipeline, but without assembly (useful for doing QC on downloaded assemblies). Generates summary plots (e.g., BUSCO, N50 beeswarm) and Excel reports for easy interpretation.

3. Nanopore-only Pipeline

Gitlab Repository

On my to-do list

Updating my Nanopore-only snakemake assembly and QC pipeline to use Autocycler instead of Flye assembly
Adding HTML and PDF reporting to the Assembly QC and Nanopore-only snakemake pipelines.
Replace skANI identification with Brooklin API for batch identification of assemblies against Genomerxiv database.

🌱 Final Thoughts

These pipelines reflect a journey of technical growth and community engagement. Making a good assembly is an art, and evolves continuously with new insights and tools.

Explore the repositories, fork them, and let’s collaborate to advance microbial genomics—one Snakemake rule at a time.