Learning Preferences for Interactive Autonomy
- URL: http://arxiv.org/abs/2210.10899v1
- Date: Wed, 19 Oct 2022 21:34:51 GMT
- Title: Learning Preferences for Interactive Autonomy
- Authors: Erdem B{\i}y{\i}k
- Abstract summary: This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities.
We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
- Score: 1.90365714903665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When robots enter everyday human environments, they need to understand their
tasks and how they should perform those tasks. To encode these, reward
functions, which specify the objective of a robot, are employed. However,
designing reward functions can be extremely challenging for complex tasks and
environments. A promising approach is to learn reward functions from humans.
Recently, several robot learning works embrace this approach and leverage human
demonstrations to learn the reward functions. Known as inverse reinforcement
learning, this approach relies on a fundamental assumption: humans can provide
near-optimal demonstrations to the robot. Unfortunately, this is rarely the
case: human demonstrations to the robot are often suboptimal due to various
reasons, e.g., difficulty of teleoperation, robot having high degrees of
freedom, or humans' cognitive limitations.
This thesis is an attempt towards learning reward functions from human users
by using other, more reliable data modalities. Specifically, we study how
reward functions can be learned using comparative feedback, in which the human
user compares multiple robot trajectories instead of (or in addition to)
providing demonstrations. To this end, we first propose various forms of
comparative feedback, e.g., pairwise comparisons, best-of-many choices,
rankings, scaled comparisons; and describe how a robot can use these various
forms of human feedback to infer a reward function, which may be parametric or
non-parametric. Next, we propose active learning techniques to enable the robot
to ask for comparison feedback that optimizes for the expected information that
will be gained from that user feedback. Finally, we demonstrate the
applicability of our methods in a wide variety of domains, ranging from
autonomous driving simulations to home robotics, from standard reinforcement
learning benchmarks to lower-body exoskeletons.
Related papers
- HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation [50.616995671367704]
We present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands.
Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies.
arXiv Detail & Related papers (2024-03-15T17:45:44Z) - Affordances from Human Videos as a Versatile Representation for Robotics [31.248842798600606]
We train a visual affordance model that estimates where and how in the scene a human is likely to interact.
The structure of these behavioral affordances directly enables the robot to perform many complex tasks.
We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild.
arXiv Detail & Related papers (2023-04-17T17:59:34Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - HERD: Continuous Human-to-Robot Evolution for Learning from Human
Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning.
We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process.
We propose to generate smooth motions via an efficient model-predictive control framework.
We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z) - Reasoning about Counterfactuals to Improve Human Inverse Reinforcement
Learning [5.072077366588174]
Humans naturally infer other agents' beliefs and desires by reasoning about their observable behavior.
We propose to incorporate the learner's current understanding of the robot's decision making into our model of human IRL.
We also propose a novel measure for estimating the difficulty for a human to predict instances of a robot's behavior in unseen environments.
arXiv Detail & Related papers (2022-03-03T17:06:37Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Feature Expansive Reward Learning: Rethinking Human Input [31.413656752926208]
We introduce a new type of human input in which the person guides the robot from states where the feature being taught is highly expressed to states where it is not.
We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function.
arXiv Detail & Related papers (2020-06-23T17:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.