In this talk, I try to answer the question: what is a data scientist? I will argue that data science can be organized around 5 main research areas: (1) data exploration and presentation; (2) data representation and transformation; (3) computing with data; (4) data modeling; (5) data visualization and presentation. I will emphasize applications of machine learning in each of these areas. I will provide several examples of recent work the fields of bioinformatics and computational biology that aims to make progress on each of these areas, taking you on a whirlwind tour of data science in biology. I will end the talk highlighting some ongoing data science projects in my group. I will argue that biology is one of the most exciting fields where data science will have positive societal impact to cure disease and improve human health and wellbeing.