The DNA data deluge is upon us. Hundreds of thousands of draft and complete genomes comprising closely-related pathogen strains are now available from public databases. These data provide us with an unprecedented opportunity to track genome evolution across all domains of life and trace the spread of infectious disease. Multiple sequence alignment has proven to be a versatile tool for global and local comparison of DNA sequences. However, multiple sequence alignment under practical scoring schemes requires O (n^k) for k sequences of length n, making multiple sequence alignment (for both large n and large k) an impossible task using traditional techniques. This talk will highlight computational strategies and new data structures to circumvent this bottleneck in the pursuit of achieving single-nucleotide resolution for multiple genome comparison and ab initio repeat family detection. I will conclude with emerging research opportunities in genomics framed by the contributions highlighted in this talk.
Dr. Todd J. Treangen is an Assistant Research Scientist at the Center for Bioinformatics and Computational Biology (CBCB) and the Assistant Director of the Center for Health-related Informatics and Bioimaging (CHIB) at the University of Maryland College Park. Prior to joining CBCB and CHIB, he was a Principal Investigator within the Genomics group at the National Biodefense Analysis and Countermeasures Center (NBACC). He received his Ph.D. in Computer Science from the Technical University of Catalonia (Barcelona, Spain). His research interests lie at the intersection of computer science and genomics, and is focused on the development of novel algorithms, methods, and software for the analysis of genomes and metagenomes