As genome sequencing technologies continue to facilitate the generation of large datasets, developing scalable algorithms has come to the forefront as a crucial step in analyzing these datasets. In this talk, I will discuss several recent advances, with a focus on the problem of reconstructing a genome from a set of reads (genome assembly). I will describe low-memory and scalable algorithms for automatic parameter selection and de Bruijn graph compaction, recently implemented in two tools KmerGenie and bcalm. I will also present recent advances in the theoretical foundations of genome assemblers.
Paul Medvedev is an Assistant Professor at the Pennsylvania State University in the departments of "Computer Science and Engineering" and "Biochemistry and Molecular Biology." He heads the Center for Computational Biology and Bioinformatics at Penn State. Prior to joining Penn State, he was a postdoc with Pavel Pevzner in UC San Diego and received his Ph.D. at the University of Toronto in Computer Science under the supervision of Michael Brudno and Allan Borodin.