log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Proposal: Effective integration of genome-scale data across species and samples
Jason Fan
Wednesday, June 15, 2022, 1:00-3:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Recent advancements in technologies for genome-scale assays and high-throughput sequencing techniques have made measurement in model-organisms both accessible and abundant. As a result, novel algorithms that exploit similarities across multiple samples and/or multiple organisms have been designed to improve analyses and gain new insights. However, these models can be difficult to optimize in practice due to the large number of interactions that have to be modeled between multiple genes across multiple samples and across multiple organisms. Furthermore, simultaneous analysis of high-throughput sequencing data of multiple samples and organisms can be prohibitively costly in terms of space.  This PhD proposal will present prior, ongoing and future work that address these challenges --- with emphasis on techniques that make analyses work well in practice.

First, I will discuss prior work that integrates data across model-organisms. We present a novel matrix factorization framework for predicting synthetic-lethal genetic-interactions that are orders of magnitude faster to train than the state-of-the-art deep-learning based approach. Here, fast training and careful application of hyper-parameter tuning techniques are key to achieve state-of-the-art performance. Second, I will discuss a recently published metric and tool that is the first to enable model-selection for transcript abundance estimation algorithms in experimental RNA-Seq data where "ground-truth" is rarely available. Finally, I will discuss future and ongoing work on a new tool that enables space-efficient indexing of huge reference sequence collections.

Examining Committee:
Department Representative:
Dr. Rob Patro            
Dr. Jordan Boyd-Graber 
Dr. Erin Molloy

Dr. Mihai Pop
Dr. Max Leiserson
Jason Fan is a PhD student working on algorithms for computational biology


This talk is organized by Tom Hurst