Recent high-throughput sequencing technologies have advanced the study of microbial communities; nonetheless, analyzing the resulting large datasets still poses challenges. This dissertation focuses on developing and validating computational algorithms to address these challenges in microbial communities' assembly, clustering, and taxonomic classification.
We first introduce a novel reference-guided metagenomic assembly approach that leverages existing sequenced genomes, enhancing traditional assembly methods. Additionally, we propose SCRAPT, an iterative sampling-based algorithm designed to efficiently cluster 16S rRNA gene sequences from large datasets. In addition, we validate a comprehensive set of pipelines of genome assembly using Oxford Nanopore sequencing, achieving near-perfect accuracy through the combination of long and short-read polishing tools.
Our research enhances the accuracy and efficiency of analyzing complex microbial communities, providing insights into their diversity and dynamics, with potential implications for human, animal, and plant health.
Tu Luan is a Ph.D. candidate in Computer Science at the University of Maryland, College Park, where she is advised by Dr. Mihai Pop. Her research focuses on developing computational methods for gene sequence clustering and improving algorithms for metagenomic assembly.