Mechanical and Civil Engineering Seminar
PhD Thesis Defense
Systems that intelligently interact with humans could improve people's lives in numerous ways and in numerous settings, such as households, hospitals, and workplaces. Yet, developing algorithms that reliably and efficiently personalize their interactions with people in real-world environments remains challenging. In particular, one major difficulty lies in adapting to human-in-the-loop feedback, in which an algorithm makes sequential decisions while receiving online feedback from humans; throughout this interaction, the algorithm seeks to optimize its decision-making quality, as measured by the utility of its performance to the human users. Such algorithms must balance between exploration and exploitation: on one hand, the algorithm must select uncertain strategies to fully explore the environment and the interacting human's preferences, while on the other hand, it must exploit the empirically-best-performing strategies to maximize its cumulative performance.
Learning from human feedback can be difficult, as people are often unreliable in specifying numerical scores. In contrast, humans can often more accurately provide various types of qualitative feedback, for instance pairwise preferences. Yet, sample efficiency is a significant concern in human-in-the-loop settings, as qualitative feedback is less informative than absolute metrics, and algorithms can typically pose only limited queries to human users. Thus, there is a need to create theoretically-grounded online learning algorithms that efficiently, reliably, and robustly optimize their interactions with humans while learning from online qualitative feedback.
This dissertation makes several contributions to algorithm design for human-in-the-loop learning. Firstly, this work develops the Dueling Posterior Sampling (DPS) algorithmic framework, a model-based, Bayesian approach for online learning in the settings of preference-based reinforcement learning and generalized linear dueling bandits. DPS is developed together with a theoretical regret analysis framework, and yields competitive empirical performance in a range of simulations. Additionally, this thesis presents the CoSpar and LineCoSpar algorithms for sample-efficient, mixed-initiative learning from pairwise preferences and coactive feedback. CoSpar and LineCoSpar are both deployed in human subject experiments with a lower-body exoskeleton to identify optimal, user-preferred exoskeleton walking gaits. This work presents the first demonstration of preference-based learning for optimizing dynamic crutchless exoskeleton walking for user comfort, and makes progress toward customizing exoskeletons and other assistive devices for individual users.
Please virtually attend this thesis defense:
Zoom Link: https://caltech.zoom.us/j/81867382120