Learning to Generalize Across Long-Horizon Tasks from Human
Demonstrations
- URL: http://arxiv.org/abs/2003.06085v2
- Date: Wed, 23 Jun 2021 05:17:45 GMT
- Title: Learning to Generalize Across Long-Horizon Tasks from Human
Demonstrations
- Authors: Ajay Mandlekar, Danfei Xu, Roberto Mart\'in-Mart\'in, Silvio Savarese,
Li Fei-Fei
- Abstract summary: Generalization Through Imitation (GTI) is a two-stage offline imitation learning algorithm.
GTI exploits a structure where demonstrated trajectories for different tasks intersect at common regions of the state space.
In the first stage of GTI, we train a policy that leverages intersections to have the capacity to compose behaviors from different demonstration trajectories together.
In the second stage of GTI, we train a goal-directed agent to generalize to novel start and goal configurations.
- Score: 52.696205074092006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning is an effective and safe technique to train robot policies
in the real world because it does not depend on an expensive random exploration
process. However, due to the lack of exploration, learning policies that
generalize beyond the demonstrated behaviors is still an open challenge. We
present a novel imitation learning framework to enable robots to 1) learn
complex real world manipulation tasks efficiently from a small number of human
demonstrations, and 2) synthesize new behaviors not contained in the collected
demonstrations. Our key insight is that multi-task domains often present a
latent structure, where demonstrated trajectories for different tasks intersect
at common regions of the state space. We present Generalization Through
Imitation (GTI), a two-stage offline imitation learning algorithm that exploits
this intersecting structure to train goal-directed policies that generalize to
unseen start and goal state combinations. In the first stage of GTI, we train a
stochastic policy that leverages trajectory intersections to have the capacity
to compose behaviors from different demonstration trajectories together. In the
second stage of GTI, we collect a small set of rollouts from the unconditioned
stochastic policy of the first stage, and train a goal-directed agent to
generalize to novel start and goal configurations. We validate GTI in both
simulated domains and a challenging long-horizon robotic manipulation domain in
the real world. Additional results and videos are available at
https://sites.google.com/view/gti2020/ .
Related papers
- Learning the Generalizable Manipulation Skills on Soft-body Tasks via Guided Self-attention Behavior Cloning Policy [9.345203561496552]
GP2E behavior cloning policy can guide the agent to learn the generalizable manipulation skills from soft-body tasks.
Our findings highlight the potential of our method to improve the generalization abilities of Embodied AI models.
arXiv Detail & Related papers (2024-10-08T07:31:10Z) - Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks [48.54757719504994]
This paper focuses on improving task success rates while reducing the amount of training data needed.
Our approach introduces a novel method that segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals.
We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms.
arXiv Detail & Related papers (2024-10-01T19:49:56Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and
Heuristic Rule-based Methods for Object Manipulation [118.27432851053335]
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track.
The No Interaction track targets for learning policies from pre-collected demonstration trajectories.
In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks.
For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms.
arXiv Detail & Related papers (2022-06-13T16:20:42Z) - Bottom-Up Skill Discovery from Unsegmented Demonstrations for
Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery.
We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations.
Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - SQUIRL: Robust and Efficient Learning from Video Demonstration of
Long-Horizon Robotic Manipulation Tasks [8.756012472587601]
Deep reinforcement learning (RL) can be used to learn complex manipulation tasks.
RL requires the robot to collect a large amount of real-world experience.
S SQUIRL performs a new but related long-horizon task robustly given only a single video demonstration.
arXiv Detail & Related papers (2020-03-10T20:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.