However, the increased availability of high-throughput sequencing technologies has generated thousands of bacterial genomes, making reference-guided assembly a valuable resource regardless of its computational cost. Thus, this study describes a novel metagenome assembly approach, called MetaCompass, that combines reference-guided assembly and de novo assembly, and it is organized in the following stages: (i) selecting reference genomes from a database using a metagenomic taxonomy classification software that combines gene and genome comparison methods, achieving species and strain level resolution; (ii) performing reference-guided assembly in a new manner, which uses the minimum set cover principle to remove redundancy in a metagenome read mapping while performing consensus calling; and (iii) performing de novo assembly using the reads that have not been mapped to any reference genomes.
We show that MetaCompass improves the most common metrics used to evaluate assembly quality—contiguity, consistency, and reference-bases metrics—for both synthetic and real datasets such as the ones gathered in the Human Microbiome Project (HMP) , and it also facilitates the assembly of low abundance microorganisms retrieved with the reference-guided approach. Lastly, we used our HMP assembly results to characterize the relative advantages and limitations of de novo and reference-guided assembly approaches, thereby providing guidance on analytical strategies for characterizing the human-associated microbiota.
Dean's rep: Dr. Stephanie A Yarwood
Members: Dr. Hector Corrada-Bravo
Dr. Abhinav Bhatele
Victoria Cepeda-Espinoza is a Ph.D student at the department of Computer Science, working under the supervision of Prof. Mihai Pop. Her research interests include different areas of computational biology with a special focus in analyzing metagenomics datasets.