log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Defense: Reference-guided assembly of Metagenomes
Victoria Cepeda Espinoza
Wednesday, August 5, 2020, 2:00-4:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Microorganisms play an important role in all of the Earth's ecosystems, and are critical for the health of humans [1], plants, and animals. Most microbes are not easily cultured  [2]; yet, Metagenomics, the analysis of organismal DNA sequences obtained directly from an environmental sample, enables the study of these microorganisms. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. The two main paradigms for this method are de novo assembly (i.e., reconstructing genomes directly from the read data), and reference-guided assembly (i.e., reconstructing genomes using closely related organisms). Because the latter paradigm has a high computational cost—due to the mapping of tens of millions of reads to thousands of full genome sequences—Metagenomic studies have primarily relied on the former paradigm.

However, the increased availability of high-throughput sequencing technologies has generated thousands of bacterial genomes, making reference-guided assembly a valuable resource regardless of its computational cost. Thus, this study describes a novel metagenome assembly approach, called MetaCompass, that combines reference-guided assembly and de novo assembly, and it is organized in the following stages: (i) selecting reference genomes from a database using a metagenomic taxonomy classification software that combines gene and genome comparison methods, achieving species and strain level resolution; (ii) performing reference-guided assembly in a new manner, which uses the minimum set cover principle to remove redundancy in a metagenome read mapping while performing consensus calling; and (iii) performing de novo assembly using the reads that have not been mapped to any reference genomes.

We show that MetaCompass improves the most common metrics used to evaluate assembly quality—contiguity, consistency, and reference-bases metrics—for both synthetic and real datasets such as the ones gathered in the Human Microbiome Project (HMP) [3], and it also facilitates the assembly of low abundance microorganisms retrieved with the reference-guided approach. Lastly, we used our HMP assembly results to characterize the relative advantages and limitations of de novo and reference-guided assembly approaches, thereby providing guidance on analytical strategies for characterizing the human-associated microbiota.

Examining Committee: 
                           Chair:              Dr. Mihai Pop           
                           Dean's rep:      Dr.  Stephanie A Yarwood
                          Members:         Dr.  Hector Corrada-Bravo 
                                                Dr.  Robert Patro  
                                                Dr. Abhinav Bhatele

Victoria Cepeda-Espinoza is a Ph.D student at the department of Computer Science, working under the supervision of Prof. Mihai Pop. Her research interests include different areas of computational biology with a special focus in analyzing metagenomics datasets.

This talk is organized by Tom Hurst