PhD Proposal: Efficient Learning for Mobile Robots

Varun Suryan

Remote

Abstract

There has been a huge interest recently in incorporating learning in robotics. This is not surprising. Given the advancements in the Machine Learning (ML) community, there have been constant efforts to utilize powerful learning-based methods to provide robots more autonomy. However, dealing with robots presents unique challenges. For example, while the state-of-the- art ML techniques can achieve super-human performance on certain tasks, these algorithms often require billions of training samples by interacting with the environment. Acquiring these many samples with mobile robots may not be feasible. Mobile robots are costly to operate due to the associated energy requirement, hardware depreciation and failure, if made to interact endlessly with the environment. Maintaining robots for long periods of time can be labor and cost intensive. As a result, when applying ML techniques in robotics that require physical interaction with the environment, minimizing the number of such interactions becomes a key. The recent progress in machine learning techniques has been spurred, in part, by access to large datasets. However, when it comes to applying these techniques in robotics, acquiring this dataset itself is a challenge since it requires physical interaction. While there is work on reducing the sample complexity of ML algorithms, reducing just the number of samples may not be sufficient for physical agents. This is because obtaining a sample may require the mobile robot to travel to a new location. This is a challenge that's typically not addressed in the general ML community.

This work aims to answer the following question: How do we make robots learn as efficiently as possible with minimal amount of physical interaction? We approach this question along two fronts: extrinsic learning and intrinsic learning. In extrinsic learning, we want the robot to learn about the external environment in which it is operating. This problem is known as Informative Path Planning (IPP). In intrinsic learning, our focus is on the robot to learn a skill such as navigating in an environment. Here, we focus on Reinforcement Learning (RL) approaches.

We study two types of problems under extrinsic learning. We start with the problem of learning a spatially varying field modeled by a Gaussian Process (GP) efficiently. Our goal is to ensure that the GP posterior variance, which is also the mean square error between the learned and actual fields, is below a predefined value. By exploiting the underlying properties of GP, we present a series of constant-factor approximation algorithms for minimizing the number of stationary sensors to place, minimize the total time taken by a single robot, and minimize the time taken by a team of robots to learn the field. Here, we assume that the GP hyperparameters are known. We then study a variant where the hyperparameters are unknown, but the goal is to find the maxima of the spatial field. For this problem, we present Upper Confidence Bound (UCB) and Monte Carlo Tree Search (MCTS) based algorithms and validate their performance empirically as well as on a real-world dataset.

For intrinsic learning, our aim is to reduce the number of physical interactions by leveraging simulations often known as the Multi-Fidelity Reinforcement Learning (MFRL). In the MFRL framework, an agent uses multiple simulators of the real environment to perform actions. We present two MFRL framework versions, model-based and model-free, that leverage GPs to learn the optimal policy in a real-world environment. By incorporating GPs in the MFRL framework, we empirically observe up to a 40% reduction in the number of samples for model-based RL and a 60% reduction for the model-free version. Our proposed work will use proximal policy optimization and sim2real approaches for the environments where multiple robots are operating.

Examining Committee:

This work aims to answer the following question: How do we make robots learn as efficiently as possible with minimal amount of physical interaction? We approach this question along two fronts: extrinsic learning and intrinsic learning. In extrinsic learning, we want the robot to learn about the external environment in which it is operating. This problem is known as Informative Path Planning (IPP). In intrinsic learning, our focus is on the robot to learn a skill such as navigating in an environment. Here, we focus on Reinforcement Learning (RL) approaches.

We study two types of problems under extrinsic learning. We start with the problem of learning a spatially varying field modeled by a Gaussian Process (GP) efficiently. Our goal is to ensure that the GP posterior variance, which is also the mean square error between the learned and actual fields, is below a predefined value. By exploiting the underlying properties of GP, we present a series of constant-factor approximation algorithms for minimizing the number of stationary sensors to place, minimize the total time taken by a single robot, and minimize the time taken by a team of robots to learn the field. Here, we assume that the GP hyperparameters are known. We then study a variant where the hyperparameters are unknown, but the goal is to find the maxima of the spatial field. For this problem, we present Upper Confidence Bound (UCB) and Monte Carlo Tree Search (MCTS) based algorithms and validate their performance empirically as well as on a real-world dataset.

For intrinsic learning, our aim is to reduce the number of physical interactions by leveraging simulations often known as the Multi-Fidelity Reinforcement Learning (MFRL). In the MFRL framework, an agent uses multiple simulators of the real environment to perform actions. We present two MFRL framework versions, model-based and model-free, that leverage GPs to learn the optimal policy in a real-world environment. By incorporating GPs in the MFRL framework, we empirically observe up to a 40% reduction in the number of samples for model-based RL and a 60% reduction for the model-free version. Our proposed work will use proximal policy optimization and sim2real approaches for the environments where multiple robots are operating.

Examining Committee:

Chair: Dr. Pratap Tokekar

Dept rep: Dr. Jordan Lee Boyd-Graber

Members: Dr. Dinesh Manocha

Dept rep: Dr. Jordan Lee Boyd-Graber

Members: Dr. Dinesh Manocha

Bio

Varun Suryan received his B.Tech. degree in mechanical engineering from Indian Institute of Technology Jodhpur, India, in 2016 and MS degree in computer engineering from Virginia Tech, Blacksburg, VA, USA, in 2019. He is pursuing his Ph.D. in computer science at University of Maryland, College Park, MD, USA. His research interests include algorithmic robotics, gaussian processes, and reinforcement learning.

This talk is organized by Tom Hurst