log in  |  register  |  feedback?  |  help  |  web accessibility
From single cells to thousands of genomes: computational challenges and algorithmic solutions in high-throughput genomics
Thursday, February 28, 2019, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)

The plummeting cost of high-throughput sequencing and the astounding variety of available sequencing assays has transformed much of biological research, and has enabled many fundamental discoveries. Unfortunately, it has also created a scientific regime in which the bottleneck in many experiments has ceased to be our ability to acquire data, and has instead become the difficulty of modeling and solving the computational challenges posed by these large and high-dimensional measurements. Simultaneously, we have been building sequencing data archives that hold immense potential, and in which latent discoveries wait to be uncovered. However, these resources remain essentially inert due to our inability to efficiently index and query "raw" experimental data.

In this talk, I will discuss some of the methods that my lab has been developing to address these challenges as they arise in different contexts. In particular, I will describe our work on Mantis, an indexing approach to enable sequence search over large collections of raw, unassembled read data. I will discuss recent progress that highlights how the colored de Bruijn graph can enable efficient neighborhood queries in the high-dimensional space of sequencing experiments, and how this leads to a new scheme for encoding k-mer membership across sets of experiments in vastly less space than previous state-of-the-art approaches. I will also discuss our recent work on alevin, a novel method for quantifying gene abundance from tagged-end, single-cell sequencing experiments (e.g. scRNA-seq). Alevin introduces a new, graph-based model to describe how the evidence of tagged sequencing reads are related to expressed genes, and proposes a new, parsimony-based approach for resolving this evidence to arrive at accurate estimates of gene expression. Crucially, alevin is the first approach which allows resolving, rather than discarding, gene ambiguous reads in this type of scRNA-seq data.


Rob Patro is an Assistant Professor of Computer Science at Stony Brook University, where he leads the Computational Biology and Network Evolution (COMBINE) lab. Prior to joining Stony Brook in 2014, Rob obtained his Ph.D. in Computer Science from the University of Maryland in 2012. From 2012 until 2014, he was a postdoctoral research associate in the Kingsford Group at the Lane Center for Computational Biology (now the Department of Computational Biology) at Carnegie Mellon University. His research interests are in the design of algorithms and data structures for processing, organizing, indexing and querying high-throughput genomics data. He is also interested in the intersection between efficient algorithms and statistical inference. His group develops and maintains many open-source bioinformatics tools, some of which are widely-used in the genomics community (https://github.com/COMBINE-lab). He is the recipient of an NSF CAREER award in 2018.

This talk is organized by Brandi Adams