THE PRELIMINARY ORAL EXAMINATION FOR THE DEGREE OF Ph.D. IN COMPUTER SCIENCE FOR
Derrick Wood
Within microbial gene finding, the prediction of translation initiation sites (TISs) is a difficult problem as there are often multiple possible starting sites for a single gene. While detection of features such as ribosomal binding sites (RBSs) has enabled gene finding programs to achieve an accuracy of over 90% in their predictions, this figure is still considerably less than the 99% sensitivity that many gene finders possess with respect to the easier problem of simply finding genes. I have already begun work to improve TIS prediction by exploiting the sequence conservation found between distant species’ genomes, and this has revealed a considerable number of errors in the genome annotations present in our public genomic databases. These errors appear to be due to an incorrect use of existing gene finding programs to find rare and interesting genomic features at the beginning of genes; I am proposing work to extend my previous approach to find such features while demanding a high amount of evidence for such discoveries so that these features can be predicted with high precision. In addition, through the use of the RNA-seq technology, a small number of bacterial transcriptomes have been sequenced within the past 3 years; this number should grow considerably in the coming months and years. Through an examination of existing transcriptome data, I have found that there exists a distinctly non-uniform distribution of distances between the beginning of transcripts and their respective genes. I propose a method to incorporate transcriptome data to estimate this distribution for a given genome, and use this estimation to produce improved TIS prediction. Finally, recent research indicates that a ribosomal binding site may not be present in nearly half of all genes, and that for such genes, a lack of secondary RNA structure allows translation in spite of the absence of an RBS. I propose the use of secondary structure prediction to augment existing TIS selection methods and improve TIS prediction for those genes where existing RBS-based methods are less effective.
Examining Committee:
Dr. Steven Salzberg - Chair
Dr. Alan Sussman - Dept’s Representative
Dr. Mihai Pop - Committee Member
EVERYBODY IS INVITED TO ATTEND THE PRESENTATION