log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Theory and practice for phylogenomic analyses of SNP-like markers
Erin Molloy
IRB 0318
Friday, September 3, 2021, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Also on Zoom: https://umd.zoom.us/j/96718034173?pwd=clNJRks5SzNUcGVxYmxkcVJGNDB4dz09

An "unlimited thirst for genome sequencing" is driving research in many domains. Evolutionary genomic biology is no exception, as demonstrated by the 10,000 Plant Genomes Project, the (60,000) Vertebrate Genomes Project, and the Earth BioGenome Project, which aims to assemble 1.5 eukaryotic genomes in the next 10 years. A goal for these ultra-large datasets is to enable researchers to address fundamental questions, such as how do species evolve/adapt to their environments and how is biodiversity created/maintained. Estimating evolutionary histories is a key step in many research studies. In this talk, we will focus on recent methodological advances for estimating evolutionary trees and networks (admixture graphs) from SNP-like markers, that is, markers that can be modeled under the neutral Wright-Fischer + infinite-sites model. In the first half of this talk, I will present two new quartet-based methods for species tree estimation. These methods are statistically consistent and outperform traditional parsimony-based methods, especially when the species tree is in the anomaly zone. Furthermore, the utilization of quartets enables efficient estimation of branch lengths and support values. In the second half of this talk, we will turn our attention to admixture graphs, specifically the popular estimation method TreeMix, which operates by computing an evolutionary tree and then augmenting it with admixture (or gene flow) edges in an iterative fashion. As I will show, TreeMix and related methods are guaranteed to get stuck in a local optimum and return an incorrect network topology for even a simple model with one admixed population incident to a leaf. This motivates the introduction (and evaluation) of a new graph search strategy, referred to as maximum likelihood network orientation (MLNO). Overall, these results provide insights into the performance of existing methods and suggest future directions for research.

Bio
Erin Molloy is an assistant professor in Computer Science at the University of Maryland, College Park and is affiliated the University of Maryland Institute for Advanced Computer Studies (UMIACS) and the Center for Bioinformatics and Computational Biology (CBCB). Her research combines discrete optimization, graph algorithms, statistics, and parallel computing to make sense of large genomic datasets, often focusing on the statistical and computational challenges that arise when estimating evolutionary trees and networks (admixture graphs). Molloy received her PhD from University of Illinois at Urbana-Champaign. Her dissertation research, advised Profs. Tandy Warnow and Bill Gropp, was supported by the NSF Graduate Research Fellowship, the Cohen Graduate Fellowship in Computer Science, two exploratory allocations on the Blue Waters supercomputer, and a residency at the Institute for Pure and Applied Math's long program: Science at Extreme Scales—Where Big Data Meets Large Scale Computing. Before coming to Maryland, Molloy was a postdoctoral scholar in Prof. Sriram Sankararaman's Machine Learning and Genomics Lab at the University of California, Los Angeles.

 

This talk is organized by Richa Mathur