Related papers: ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning

ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning

URL: http://arxiv.org/abs/2202.02465v1
Date: Sat, 5 Feb 2022 02:01:19 GMT
Title: ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning
Authors: Sean Chen, Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey Levine
Abstract summary: Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem. This approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse. We propose a hierarchical solution that learns efficiently from sparse user feedback.
Score: 91.58711082348293
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e.g., webcam images of eye gaze) can be challenging, especially when it involves inferring the user's desired action in the absence of a natural 'default' interface. Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem, and enables the interface to adapt to individual users. However, this approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse. We propose a hierarchical solution that learns efficiently from sparse user feedback: we use offline pre-training to acquire a latent embedding space of useful, high-level robot behaviors, which, in turn, enables the system to focus on using online user feedback to learn a mapping from user inputs to desired high-level behaviors. The key insight is that access to a pre-trained policy enables the system to learn more from sparse rewards than a na\"ive RL algorithm: using the pre-trained policy, the system can make use of successful task executions to relabel, in hindsight, what the user actually meant to do during unsuccessful executions. We evaluate our method primarily through a user study with 12 participants who perform tasks in three simulated robotic manipulation domains using a webcam and their eye gaze: flipping light switches, opening a shelf door to reach objects inside, and rotating a valve. The results show that our method successfully learns to map 128-dimensional gaze features to 7-dimensional joint torques from sparse rewards in under 10 minutes of online training, and seamlessly helps users who employ different gaze strategies, while adapting to distributional shift in webcam inputs, tasks, and environments.

Related papers

Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning [82.91837418721182]
Adaptive interfaces can help users perform sequential decision-making tasks. Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users. We propose a reinforcement learning algorithm to train an interface to map raw command signals to actions.
arXiv Detail & Related papers (2023-09-07T16:52:27Z)
Dexterous Manipulation from Images: Autonomous Real-World RL via Substep Guidance [71.36749876465618]
We describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks. Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples. experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world.
arXiv Detail & Related papers (2022-12-19T22:50:40Z)
First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization [112.40598205054994]
We formalize this idea as a completely unsupervised objective for optimizing interfaces. We conduct an observational study on 540K examples of users operating various keyboard and eye gaze interfaces for typing, controlling simulated robots, and playing video games. The results show that our mutual information scores are predictive of the ground-truth task completion metrics in a variety of domains.
arXiv Detail & Related papers (2022-05-24T21:57:18Z)
X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback [83.95599156217945]
We focus on assistive typing applications in which a user cannot operate a keyboard, but can supply other inputs. Standard methods train a model on a fixed dataset of user inputs, then deploy a static interface that does not learn from its mistakes. We investigate a simple idea that would enable such interfaces to improve over time, with minimal additional effort from the user.
arXiv Detail & Related papers (2022-03-04T00:07:20Z)
Inducing Structure in Reward Learning by Learning Features [31.413656752926208]
We introduce a novel type of human input for teaching features and an algorithm that utilizes it to learn complex features from the raw state space. We demonstrate our method in settings where all features have to be learned from scratch, as well as where some of the features are known.
arXiv Detail & Related papers (2022-01-18T16:02:29Z)
Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings. We develop an algorithm to train the policy iteratively on new data collected by the system. We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z)
Interactive Search Based on Deep Reinforcement Learning [4.353144350714567]
The project mainly establishes a virtual user environment for offline training. At the same time, we tried to improve a reinforcement learning algorithm based on bi-clustering to expand the action space and recommended path space of the recommendation agent.
arXiv Detail & Related papers (2020-12-09T15:23:53Z)
Human-guided Robot Behavior Learning: A GAN-assisted Preference-based Reinforcement Learning Approach [2.9764834057085716]
We propose a new GAN-assisted human preference-based reinforcement learning approach. It uses a generative adversarial network (GAN) to actively learn human preferences and then replace the role of human in assigning preferences. Our method can achieve a reduction of about 99.8% human time without performance sacrifice.
arXiv Detail & Related papers (2020-10-15T01:44:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.