Bootstrapping Adaptive Human-Machine Interfaces with Offline
Reinforcement Learning
- URL: http://arxiv.org/abs/2309.03839v1
- Date: Thu, 7 Sep 2023 16:52:27 GMT
- Title: Bootstrapping Adaptive Human-Machine Interfaces with Offline
Reinforcement Learning
- Authors: Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey
Levine
- Abstract summary: Adaptive interfaces can help users perform sequential decision-making tasks.
Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users.
We propose a reinforcement learning algorithm to train an interface to map raw command signals to actions.
- Score: 82.91837418721182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptive interfaces can help users perform sequential decision-making tasks
like robotic teleoperation given noisy, high-dimensional command signals (e.g.,
from a brain-computer interface). Recent advances in human-in-the-loop machine
learning enable such systems to improve by interacting with users, but tend to
be limited by the amount of data that they can collect from individual users in
practice. In this paper, we propose a reinforcement learning algorithm to
address this by training an interface to map raw command signals to actions
using a combination of offline pre-training and online fine-tuning. To address
the challenges posed by noisy command signals and sparse rewards, we develop a
novel method for representing and inferring the user's long-term intent for a
given trajectory. We primarily evaluate our method's ability to assist users
who can only communicate through noisy, high-dimensional input channels through
a user study in which 12 participants performed a simulated navigation task by
using their eye gaze to modulate a 128-dimensional command signal from their
webcam. The results show that our method enables successful goal navigation
more often than a baseline directional interface, by learning to denoise user
commands signals and provide shared autonomy assistance. We further evaluate on
a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander
game with simulated user commands, and find that our method improves over
baseline interfaces in these domains as well. Extensive ablation experiments
with simulated user commands empirically motivate each component of our method.
Related papers
- I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data [4.487146086221174]
We present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings.
Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations.
arXiv Detail & Related papers (2024-06-10T13:08:31Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - SimCURL: Simple Contrastive User Representation Learning from Command
Sequences [22.92215383896495]
We propose SimCURL, a contrastive self-supervised deep learning framework that learns user representation from unlabeled command sequences.
We train and evaluate our method on a real-world command sequence dataset of more than half a billion commands.
arXiv Detail & Related papers (2022-07-29T16:06:03Z) - First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual
Information Maximization [112.40598205054994]
We formalize this idea as a completely unsupervised objective for optimizing interfaces.
We conduct an observational study on 540K examples of users operating various keyboard and eye gaze interfaces for typing, controlling simulated robots, and playing video games.
The results show that our mutual information scores are predictive of the ground-truth task completion metrics in a variety of domains.
arXiv Detail & Related papers (2022-05-24T21:57:18Z) - X2T: Training an X-to-Text Typing Interface with Online Learning from
User Feedback [83.95599156217945]
We focus on assistive typing applications in which a user cannot operate a keyboard, but can supply other inputs.
Standard methods train a model on a fixed dataset of user inputs, then deploy a static interface that does not learn from its mistakes.
We investigate a simple idea that would enable such interfaces to improve over time, with minimal additional effort from the user.
arXiv Detail & Related papers (2022-03-04T00:07:20Z) - ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement
Learning [91.58711082348293]
Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem.
This approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse.
We propose a hierarchical solution that learns efficiently from sparse user feedback.
arXiv Detail & Related papers (2022-02-05T02:01:19Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.