log in  |  register  |  feedback?  |  help  |  web accessibility
Quartet-based species tree methods enable fast and consistent tree of blobs reconstruction under the network multispecies coalescent
Thursday, March 12, 2026, 2:00-3:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Gene flow between species or populations is an important force in evolution, modeled by the network multispecies coalescent. Reconstructing evolutionary histories, called species networks, under this model is notoriously challenging, with the leading methods scaling to just tens of species. Divide-and-conquer is a promising path forward; however, methods with statistical consistency guarantees require the tree of blobs (TOB), which displays only the tree-like parts of the network. TOB reconstruction under the NMSC is challenging in its own right, with the only available method TINNiK having time complexity O(n5 + n4k), where k is the number of input gene trees and n is the number of species. Here, we present a framework for TOB reconstruction that operates by (1) seeking a refinement of the TOB and then (2) contracting edges in it. For step (1), we show that an optimal solution to Weighted Quartet Consensus is a TOB refinement almost surely, as the number of gene trees increases, motivating the use of fast quartet-based methods for species tree estimation such as ASTRAL or TREE-QMC. For step (2), we contract edges in the refinement tree based on the same hypothesis tests as TINNiK, which are applicable to subsets of four taxa. We show that sampling just O(n) four-taxon subsets around each edge enables statistically consistent TOB estimation, with asymptotic runtime dominated by tree reconstruction. Leveraging TREE-QMC for this step gives our method a time complexity of O(n3k) and its name TOB-QMC. On simulated data sets, TOB-QMC is typically at least as accurate and often more accurate than TINNiK. Moreover, TOB-QMC scales to larger data sets and enable fast and interpretable exploration of hyperparameters used in hypothesis testing. We demonstrate the importance of this feature on phylogenomic data sets. Lastly, our framework is related to analyses performed by biologists, as network methods do not scale. Our theoretical results provide justification for these analyses and guide interpretations of quartet-based species trees in the presence of gene flow; this context is critical given the recent result that tree-based network inference with ASTRAL can be positively misleading.

This talk is organized by Marcus Fedarko