Robots are increasingly being deployed in everyday environments, assisting with household chores such as folding laundry, fetching items in structured environments such as hospitals and warehouses, and even assisting humans physically in rehabilitation or assistive feeding tasks. However, enabling robots to perform these manipulation tasks reliably poses fundamental challenges in how robots learn effective policies. Traditional robot control approaches fall into several categories. Classical planning and control methods require accurate models of objects and environments, which are difficult to obtain for diverse real-world scenarios. Imitation learning approaches require thousands of expert demonstrations, each demanding significant human time, specialized hardware, and careful setup. Reinforcement learning methods require millions of environment interactions along with reward engineering that necessitates expert knowledge and iterative refinement. Recent work suggests that structured human guidance, such as rough sketches of the plan and task decomposition, can make learning more data-efficient by aligning robot training with how humans naturally teach. The central question this dissertation addresses is: Can we reduce both the quantity of data needed and the effort required to collect it by learning from structured human guidance?
We investigate three types of structured guidance: spatial sketches, hierarchical task decomposition, and comparative preferences. For spatial guidance, we develop methods translating 2D trajectory sketches into 3D robot motions, with single sketches generating multiple trajectory variations. For hierarchical guidance, we decompose long-horizon tasks into discrete action primitives, continuous parameters, and motor execution, enabling primitives to transfer across scenarios through parameter adaptation. We also learn when to invoke classical motion planning versus learned policies through mode classification. For preferential guidance, we use vision-language models comparing trajectory options overlaid on visual observations, with agent-aware reward regularization adapting feedback as robot capabilities improve. We extend guidance to multisensory demonstrations, where learned policies integrate vision, touch, audio, and proprioception. We present methods using implicit maximum likelihood estimation with batch-global rejection sampling to fuse these modalities, capturing multimodal action distributions while enabling real-time control. Across healthcare feeding tasks, warehouse manipulation, and household environments, our methods achieve strong performance with substantially reduced data requirements and lower collection burden compared to traditional approaches.
Amisha Bhaskar is a PhD student in Computer Science at the University of Maryland, advised by Professor Pratap Tokekar in the Robotics Algorithms and Autonomous Systems (RAAS) Lab. Her research focuses on robot learning for manipulation and mobile manipulation, particularly integrating reinforcement learning and imitation learning to enable robots to perform complex, generalizable, and data-efficient tasks in real-world environments such as healthcare and assistive robotics.

