Adventures in Bacterial Genome Assembly with Snakemake

🧬 Adventures in Bacterial Genome Assembly: My Illumina Snakemake Pipelines on GitLab

Inspired by Dr. Ryan Wick’s and and Dr. Torsten Seeman’s pioneering work in bacterial genome assembly, I developed a series of Snakemake workflows to streamline and scale genome assembly tasks. These pipelines are hosted on GitLab and designed for reproducibility, modularity, and ease of use.

🚀 Why Snakemake?

Snakemake enables reproducible and scalable workflows. It supports:

  • Complex rule chaining for Illumina, Nanopore or hybrid data
  • Integrated QC, taxonomic read and/or assembly classification, and polishing
  • Version control via GitLab for transparency and collaboration
  • Better resource management (jobs, CPU, monitoring)

🧪 The Pipelines

1. Illumina Bacterial Assembly + QC

GitLab Repository

This pipeline assembles genomes from Illumina paired-end reads and includes QC tools like Quast, CheckM2, BUSCO, and taxonomic classification via Kraken2 and GTDB-Tk. Includes comprehensive HTML and PDF reports.

dag

2. Assembly QC Pipeline

GitLab Repository

Same QC tools as in the previous pipeline, but without assembly (useful for doing QC on downloaded assemblies). Generates summary plots (e.g., BUSCO, N50 beeswarm) and Excel reports for easy interpretation.

3. Nanopore-only Pipeline

Gitlab Repository

On my to-do list

🌱 Final Thoughts

These pipelines reflect a journey of technical growth and community engagement. Making a good assembly is an art, and evolves continuously with new insights and tools.

Explore the repositories, fork them, and let’s collaborate to advance microbial genomics—one Snakemake rule at a time.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Superior Bacterial Genome Assembly with Autocycler Consensus and Snakemake
  • Introducing the qPCR Primer Analyzer ILVO
  • Batch In-Silico (q)PCR Made Easy with run_epcr.py
  • From Bash to Copilot: My Bioinformatics Terminal Workflow