Wish you were here: Hindsight Goal Selection for long-horizon dexterous
manipulation
- URL: http://arxiv.org/abs/2112.00597v2
- Date: Thu, 2 Dec 2021 08:50:36 GMT
- Title: Wish you were here: Hindsight Goal Selection for long-horizon dexterous
manipulation
- Authors: Todor Davchev, Oleg Sushkov, Jean-Baptiste Regli, Stefan Schaal, Yusuf
Aytar, Markus Wulfmeier, Jon Scholz
- Abstract summary: Solving tasks with a sparse reward in a sample-efficient manner poses a challenge to modern reinforcement learning.
Existing strategies explore based on task-agnostic goal distributions, which can render the solution of long-horizon tasks impractical.
We extend hindsight relabelling mechanisms to guide exploration along task-specific distributions implied by a small set of successful demonstrations.
- Score: 14.901636098553848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex sequential tasks in continuous-control settings often require agents
to successfully traverse a set of "narrow passages" in their state space.
Solving such tasks with a sparse reward in a sample-efficient manner poses a
challenge to modern reinforcement learning (RL) due to the associated
long-horizon nature of the problem and the lack of sufficient positive signal
during learning. Various tools have been applied to address this challenge.
When available, large sets of demonstrations can guide agent exploration.
Hindsight relabelling on the other hand does not require additional sources of
information. However, existing strategies explore based on task-agnostic goal
distributions, which can render the solution of long-horizon tasks impractical.
In this work, we extend hindsight relabelling mechanisms to guide exploration
along task-specific distributions implied by a small set of successful
demonstrations. We evaluate the approach on four complex, single and dual arm,
robotics manipulation tasks against strong suitable baselines. The method
requires far fewer demonstrations to solve all tasks and achieves a
significantly higher overall performance as task complexity increases. Finally,
we investigate the robustness of the proposed solution with respect to the
quality of input representations and the number of demonstrations.
Related papers
- Screw Geometry Meets Bandits: Incremental Acquisition of Demonstrations to Generate Manipulation Plans [9.600625243282618]
We study the problem of methodically obtaining a sufficient set of kinesthetic demonstrations, one at a time.
We present a novel approach to address these open problems using (i) a screw geometric representation to generate manipulation plans from demonstrations.
We present experimental results on two example manipulation tasks, namely, pouring and scooping, to illustrate our approach.
arXiv Detail & Related papers (2024-10-23T20:57:56Z) - Learning Complex Teamwork Tasks Using a Given Sub-task Decomposition [11.998708550268978]
We propose an approach which uses an expert-provided decomposition of a task into simpler multi-agent sub-tasks.
In each sub-task, a subset of the entire team is trained to acquire sub-task-specific policies.
The sub-teams are then merged and transferred to the target task, where their policies are collectively fine-tuned to solve the more complex target task.
arXiv Detail & Related papers (2023-02-09T21:24:56Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks.
We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Weakly-Supervised Reinforcement Learning for Controllable Behavior [126.04932929741538]
Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks.
In many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve.
We introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks.
arXiv Detail & Related papers (2020-04-06T17:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.