log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Proposal: Accurate and Scalable Phylogenetic Tree Estimation via Graph Cuts
Yunheng Han
IRB 3137
Thursday, May 9, 2024, 2:00-3:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

An evolutionary tree, commonly referred to as a phylogeny, depicts the relationships among biological entities, like species or organisms. Phylogenetic trees provide insight into key questions related to biodiversity, evolution, ecology, and genetics. However, they are difficult to estimate accurately because of complex evolutionary processes, the hardness of the related optimization problems, and incomplete and inaccurate data in practice. To address these challenges, I introduce TREE-QMC, a new method for estimating a species tree from gene trees based on four-leaf trees, called quartets. TREE-QMC is based on the divide-and-conquer approach proposed by Sagi Snir and Satish Rao (2010). My major contribution is showing that a mathematical object called “the quartet graph” can be directly built from gene trees (without explicitly enumerating quartets). This enables TREE-QMC to run in cubic time, making it the first method to break the quartic time barrier (without down-sampling the input) since the divide-and-conquer approach was proposed in 2010. Moreover, TREE-QMC improves accuracy under challenging scenarios by normalizing the quartet weights to account for “artificial taxa”, which are introduced during the divide phase so subproblem solutions can be combined during the conquer phase. I recently showed how to extend the algorithm to incorporate extra information including branch lengths and support values in weighting quartets, further improving robustness to error-prone inputs, including homology errors. Going forward, I propose future directions to extend the TREE-QMC approach to reconstruct phylogenetic networks rather than trees because tree representation is insufficient for many evolutionary scenarios, like hybridization or horizontal gene transfers.

Bio

Yunheng Han is a fifth year PhD student at the University of Maryland, College Park, advised by Dr. Erin Molloy. His research focuses on developing fast and accurate algorithms for estimating phylogenetic trees and networks from genomic data.

This talk is organized by Migo Gui