The emergence of deep learning based methods for search poses several challenges and opportunities not just for modeling, but also for benchmarking and measuring progress in the field. Some of these challenges are new, while others have evolved from existing challenges in IR benchmarking exacerbated by the scale at which deep learning models operate. Evaluation efforts such as the TREC Deep Learning track and the MS MARCO public leaderboard are intended to encourage research and track our progress, addressing big questions in our field. The goal is not simply to identify which run is “best” but to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This entails a wider conversation in the IR community about what constitutes meaningful progress, how benchmark design can encourage or discourage certain outcomes, and about the validity of our findings. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track—and reflect on the state of the field and the road ahead.
Bhaskar Mitra is a Principal Applied Scientist at Bing in Montreal, Canada. He joined Microsoft in 2006 and Bing—then called Live Search—in 2007. Before moving to Montreal, he has been part of the Microsoft labs in Hyderabad (India), Bellevue (USA), and Cambridge (UK). His research interests include machine learning and information retrieval, and in particular the topic of neural information retrieval. He co-organized multiple workshops and tutorials, served as a guest editor for the special issue of the Information Retrieval Journal, and co-authored a book on the topic of neural information retrieval. He is currently a doctoral graduand at University College London under the supervision of Dr. Emine Yilmaz and Dr. David Barber.