Feature Expansive Reward Learning: Rethinking Human Input
- URL: http://arxiv.org/abs/2006.13208v2
- Date: Tue, 12 Jan 2021 18:59:50 GMT
- Title: Feature Expansive Reward Learning: Rethinking Human Input
- Authors: Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan
- Abstract summary: We introduce a new type of human input in which the person guides the robot from states where the feature being taught is highly expressed to states where it is not.
We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function.
- Score: 31.413656752926208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When a person is not satisfied with how a robot performs a task, they can
intervene to correct it. Reward learning methods enable the robot to adapt its
reward function online based on such human input, but they rely on handcrafted
features. When the correction cannot be explained by these features, recent
work in deep Inverse Reinforcement Learning (IRL) suggests that the robot could
ask for task demonstrations and recover a reward defined over the raw state
space. Our insight is that rather than implicitly learning about the missing
feature(s) from demonstrations, the robot should instead ask for data that
explicitly teaches it about what it is missing. We introduce a new type of
human input in which the person guides the robot from states where the feature
being taught is highly expressed to states where it is not. We propose an
algorithm for learning the feature from the raw state space and integrating it
into the reward function. By focusing the human input on the missing feature,
our method decreases sample complexity and improves generalization of the
learned reward over the above deep IRL baseline. We show this in experiments
with a physical 7DOF robot manipulator, as well as in a user study conducted in
a simulated environment.
Related papers
- Adaptive Language-Guided Abstraction from Contrastive Explanations [53.48583372522492]
It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward.
End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features.
This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
arXiv Detail & Related papers (2024-09-12T16:51:58Z) - Autonomous Robotic Reinforcement Learning with Asynchronous Human
Feedback [27.223725464754853]
GEAR enables robots to be placed in real-world environments and left to train autonomously without interruption.
System streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans.
arXiv Detail & Related papers (2023-10-31T16:43:56Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Aligning Robot and Human Representations [50.070982136315784]
We argue that current representation learning approaches in robotics should be studied from the perspective of how well they accomplish the objective of representation alignment.
We mathematically define the problem, identify its key desiderata, and situate current methods within this formalism.
arXiv Detail & Related papers (2023-02-03T18:59:55Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities.
We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z) - Human-to-Robot Imitation in the Wild [50.49660984318492]
We propose an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective.
We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild.
arXiv Detail & Related papers (2022-07-19T17:59:59Z) - Reasoning about Counterfactuals to Improve Human Inverse Reinforcement
Learning [5.072077366588174]
Humans naturally infer other agents' beliefs and desires by reasoning about their observable behavior.
We propose to incorporate the learner's current understanding of the robot's decision making into our model of human IRL.
We also propose a novel measure for estimating the difficulty for a human to predict instances of a robot's behavior in unseen environments.
arXiv Detail & Related papers (2022-03-03T17:06:37Z) - Inducing Structure in Reward Learning by Learning Features [31.413656752926208]
We introduce a novel type of human input for teaching features and an algorithm that utilizes it to learn complex features from the raw state space.
We demonstrate our method in settings where all features have to be learned from scratch, as well as where some of the features are known.
arXiv Detail & Related papers (2022-01-18T16:02:29Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.