Reasoning about Counterfactuals to Improve Human Inverse Reinforcement
Learning
- URL: http://arxiv.org/abs/2203.01855v1
- Date: Thu, 3 Mar 2022 17:06:37 GMT
- Title: Reasoning about Counterfactuals to Improve Human Inverse Reinforcement
Learning
- Authors: Michael S. Lee, Henny Admoni, Reid Simmons
- Abstract summary: Humans naturally infer other agents' beliefs and desires by reasoning about their observable behavior.
We propose to incorporate the learner's current understanding of the robot's decision making into our model of human IRL.
We also propose a novel measure for estimating the difficulty for a human to predict instances of a robot's behavior in unseen environments.
- Score: 5.072077366588174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To collaborate well with robots, we must be able to understand their decision
making. Humans naturally infer other agents' beliefs and desires by reasoning
about their observable behavior in a way that resembles inverse reinforcement
learning (IRL). Thus, robots can convey their beliefs and desires by providing
demonstrations that are informative for a human's IRL. An informative
demonstration is one that differs strongly from the learner's expectations of
what the robot will do given their current understanding of the robot's
decision making. However, standard IRL does not model the learner's existing
expectations, and thus cannot do this counterfactual reasoning. We propose to
incorporate the learner's current understanding of the robot's decision making
into our model of human IRL, so that our robot can select demonstrations that
maximize the human's understanding. We also propose a novel measure for
estimating the difficulty for a human to predict instances of a robot's
behavior in unseen environments. A user study finds that our test difficulty
measure correlates well with human performance and confidence. Interestingly,
considering human beliefs and counterfactuals when selecting demonstrations
decreases human performance on easy tests, but increases performance on
difficult tests, providing insight on how to best utilize such models.
Related papers
- HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation [50.616995671367704]
We present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands.
Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies.
arXiv Detail & Related papers (2024-03-15T17:45:44Z) - SACSoN: Scalable Autonomous Control for Social Navigation [62.59274275261392]
We develop methods for training policies for socially unobtrusive navigation.
By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space.
We collect a large dataset where an indoor mobile robot interacts with human bystanders.
arXiv Detail & Related papers (2023-06-02T19:07:52Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Aligning Robot and Human Representations [50.070982136315784]
We argue that current representation learning approaches in robotics should be studied from the perspective of how well they accomplish the objective of representation alignment.
We mathematically define the problem, identify its key desiderata, and situate current methods within this formalism.
arXiv Detail & Related papers (2023-02-03T18:59:55Z) - Towards Modeling and Influencing the Dynamics of Human Learning [26.961274302321343]
We take a step towards enabling robots to understand the influence they have, leverage it to better assist people, and help human models more quickly align with reality.
Our key idea is to model the human's learning as a nonlinear dynamical system which evolves the human's internal model given new observations.
We then formalize how robots can influence human learning by embedding the human's learning dynamics model into the robot planning problem.
arXiv Detail & Related papers (2023-01-02T23:59:45Z) - Learning Latent Representations to Co-Adapt to Humans [12.71953776723672]
Non-stationary humans are challenging for robot learners.
In this paper we introduce an algorithmic formalism that enables robots to co-adapt alongside dynamic humans.
arXiv Detail & Related papers (2022-12-19T16:19:24Z) - HERD: Continuous Human-to-Robot Evolution for Learning from Human
Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning.
We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities.
We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z) - Dynamically Switching Human Prediction Models for Efficient Planning [32.180808286226075]
We give the robot access to a suite of human models and enable it to assess the performance-computation trade-off online.
Our experiments in a driving simulator showcase how the robot can achieve performance comparable to always using the best human model.
arXiv Detail & Related papers (2021-03-13T23:48:09Z) - Quantifying Hypothesis Space Misspecification in Learning from
Human-Robot Demonstrations and Physical Corrections [34.53709602861176]
Recent work focuses on how robots can use such input to learn intended objectives.
We demonstrate our method on a 7 degree-of-freedom robot manipulator in learning from two important types of human input.
arXiv Detail & Related papers (2020-02-03T18:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.