Learning to Discern: Imitating Heterogeneous Human Demonstrations with
Preference and Representation Learning
- URL: http://arxiv.org/abs/2310.14196v1
- Date: Sun, 22 Oct 2023 06:08:55 GMT
- Title: Learning to Discern: Imitating Heterogeneous Human Demonstrations with
Preference and Representation Learning
- Authors: Sachit Kuhar and Shuo Cheng and Shivang Chopra and Matthew Bronars and
Danfei Xu
- Abstract summary: This paper introduces Learning to Discern (L2D), an offline imitation learning framework for learning from demonstrations with diverse quality and style.
We show that L2D can effectively assess and learn from varying demonstrations, thereby leading to improved policy performance across a range of tasks in both simulations and on a physical robot.
- Score: 12.4468604987226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Practical Imitation Learning (IL) systems rely on large human demonstration
datasets for successful policy learning. However, challenges lie in maintaining
the quality of collected data and addressing the suboptimal nature of some
demonstrations, which can compromise the overall dataset quality and hence the
learning outcome. Furthermore, the intrinsic heterogeneity in human behavior
can produce equally successful but disparate demonstrations, further
exacerbating the challenge of discerning demonstration quality. To address
these challenges, this paper introduces Learning to Discern (L2D), an offline
imitation learning framework for learning from demonstrations with diverse
quality and style. Given a small batch of demonstrations with sparse quality
labels, we learn a latent representation for temporally embedded trajectory
segments. Preference learning in this latent space trains a quality evaluator
that generalizes to new demonstrators exhibiting different styles. Empirically,
we show that L2D can effectively assess and learn from varying demonstrations,
thereby leading to improved policy performance across a range of tasks in both
simulations and on a physical robot.
Related papers
- Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations [8.71931996488953]
We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments.
We analyze several techniques that leverage human feedback for representation learning and alignment to enable effective cross-embodiment learning.
arXiv Detail & Related papers (2024-08-10T18:24:14Z) - AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent [75.91274222142079]
In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents.
AdaDemo is a framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset.
arXiv Detail & Related papers (2024-04-11T01:59:29Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Out-of-Dynamics Imitation Learning from Multimodal Demonstrations [68.46458026983409]
We study out-of-dynamics imitation learning (OOD-IL), which relaxes the assumption to that the demonstrator and the imitator have the same state spaces.
OOD-IL enables imitation learning to utilize demonstrations from a wide range of demonstrators but introduces a new challenge.
We develop a better transferability measurement to tackle this newly-emerged challenge.
arXiv Detail & Related papers (2022-11-13T07:45:06Z) - Eliciting Compatible Demonstrations for Multi-Human Imitation Learning [16.11830547863391]
Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation.
Natural human behavior has a great deal of heterogeneity, with several optimal ways to demonstrate a task.
This mismatch presents a problem for interactive imitation learning, where sequences of users improve on a policy by iteratively collecting new, possibly conflicting demonstrations.
We show that we can both identify incompatible demonstrations via post-hoc filtering, and apply our compatibility measure to actively elicit compatible demonstrations from new users.
arXiv Detail & Related papers (2022-10-14T19:37:55Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.