XIRL: Cross-embodiment Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2106.03911v1
- Date: Mon, 7 Jun 2021 18:45:07 GMT
- Title: XIRL: Cross-embodiment Inverse Reinforcement Learning
- Authors: Kevin Zakka, Andy Zeng, Pete Florence, Jonathan Tompson, Jeannette
Bohg, Debidatta Dwibedi
- Abstract summary: We show that it is possible to automatically learn vision-based reward functions from cross-embodiment demonstration videos.
Specifically, we present a self-supervised method for Cross-embodiment Inverse Reinforcement Learning.
We find our learned reward function not only works for embodiments seen during training, but also generalizes to entirely new embodiments.
- Score: 25.793366206387827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the visual cross-embodiment imitation setting, in which agents
learn policies from videos of other agents (such as humans) demonstrating the
same task, but with stark differences in their embodiments -- shape, actions,
end-effector dynamics, etc. In this work, we demonstrate that it is possible to
automatically discover and learn vision-based reward functions from
cross-embodiment demonstration videos that are robust to these differences.
Specifically, we present a self-supervised method for Cross-embodiment Inverse
Reinforcement Learning (XIRL) that leverages temporal cycle-consistency
constraints to learn deep visual embeddings that capture task progression from
offline videos of demonstrations across multiple expert agents, each performing
the same task differently due to embodiment differences. Prior to our work,
producing rewards from self-supervised embeddings has typically required
alignment with a reference trajectory, which may be difficult to acquire. We
show empirically that if the embeddings are aware of task-progress, simply
taking the negative distance between the current state and goal state in the
learned embedding space is useful as a reward for training policies with
reinforcement learning. We find our learned reward function not only works for
embodiments seen during training, but also generalizes to entirely new
embodiments. We also find that XIRL policies are more sample efficient than
baselines, and in some cases exceed the sample efficiency of the same agent
trained with ground truth sparse rewards.
Related papers
- Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations [8.71931996488953]
We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments.
We analyze several techniques that leverage human feedback for representation learning and alignment to enable effective cross-embodiment learning.
arXiv Detail & Related papers (2024-08-10T18:24:14Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Imitation from Observation With Bootstrapped Contrastive Learning [12.048166025000976]
Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process.
We present BootIfOL, an IfO algorithm that aims to learn a reward function that takes an agent trajectory and compares it to an expert.
We evaluate our approach on a variety of control tasks showing that we can train effective policies using a limited number of demonstrative trajectories.
arXiv Detail & Related papers (2023-02-13T17:32:17Z) - Reinforcement learning with Demonstrations from Mismatched Task under
Sparse Reward [7.51772160511614]
Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems.
Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task.
In this paper, we consider the case where the target task is mismatched from but similar with that of the expert.
Existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards.
arXiv Detail & Related papers (2022-12-03T02:24:59Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.