Biological databases are rapidly growing in size, making it impossible for scientists to verify the data and correct errors. The impact of database errors on the conclusions of analytic workflows that rely on these databases is not currently well understood. Given the increase reliance in both biomedical research and clinical practice on computational analytics, it is important to develop a better understanding of how data and software interact. I will describe new results from my lab that demonstrate that some classifiers can be influenced by even small errors in the data, and that using computationally-inferred labels in databases can skew the classification output. These results underscore the need for deeper research into the interaction between software and data in biomedical applications.
Mihai Pop is a professor of computer science and co-director of the University of Maryland Center of Excellence in Microbiome Sciences. He develops computational approaches for analyzing microbial communities, particularly for characterizing their strain-level diversity. Other interests include biological databases, antibiotic resistance, and software testing. His lab has developed several widely used open-source software tools for the analysis of genomic and metagenomic data, including software for sequence alignment, genome and metagenome assembly, sequence clustering, and for assessing metagenome associations with phenotypes. Pop teaches at all academic levels and is a strong advocate for inclusion and diversity within the scientific community. He has a particular interest in developing open educational resources for computer science and bioinformatics and he seeks new ways to engage students and promote critical thinking and learning in his classes. Pop holds a B.S. (1994, Politehnica University in Bucharest, Romania), and a Ph.D in Computer Science (2000, The Johns Hopkins University), and has joined the University of Maryland in 2005. He is an MPower Professor, University of Maryland Strategic Partnership: MPowering the State, and fellow of the Association of Computing Machinery and of the International Society for Computational Biology.

