Customer statistics collected in several real-world systems have reflected that users often prefer eliciting their liking for a given pair of items, say (A,B), in terms of relative queries like: "Do you prefer Item A over B?", rather than their absolute counterparts: ``How much do you score items A and B on a scale of [0-10]?". Drawing inspirations, in the search for a more effective feedback collection mechanism, led to the famous formulation of Dueling Bandits (DB), which is a widely studied online learning framework for efficient information aggregation from relative/comparative feedback. However despite the novel objective, unfortunately, most of the existing DB techniques were limited only to simpler settings of finite decision spaces, and stochastic environments, which are unrealistic in practice.
In this talk, we will start with the basic problem formulations for DB and familiarize ourselves with some of the breakthrough results. Following this, will dive deeper into a more practical framework of contextual dueling bandits (C-DB) where the goal of the learner is to make personalized predictions based on the user contexts. We will see a new algorithmic approach that can efficiently achieve the optimal O(\sqrt T) regret performance for this problem, resolving an open problem from DudÃk et al. [COLT, 2015]. In the last part of the talk, we will extend the aforementioned models to a federated framework, which entails developing preference-driven prediction models for distributed environments for creating large-scale personalized systems, including recommender systems and chatbot interactions. Apart from exploiting the limited preference feedback model, the challenge lies in ensuring user privacy and reducing communication complexity in the federated setting. We will conclude the talk with some interesting open problems.
Aadirupa is currently a research scientist at Apple ML research, broadly working in the area of Machine Learning theory. She did a short-term research visit at Toyota Technological Institute, Chicago (TTIC), after finishing her postdoc at Microsoft Research New York City. She obtained her Ph.D. from IISc Bangalore with Aditya Gopalan and Chiranjib Bhattacharyya.
Research interests: Broadly anything related to designing Efficient Human Aligned Prediction Models. A few specific research areas include Online learning theory, Bandits & RL, Federated Optimization, and Differential Privacy. Of late, she has also been working on some problems at the intersection of Mechanism Design, Game Theory, and Algorithmic Fairness.
Website: https://aadirupa.github.io/