Asymmetric self-play for automatic goal discovery in robotic
manipulation
- URL: http://arxiv.org/abs/2101.04882v1
- Date: Wed, 13 Jan 2021 05:20:20 GMT
- Title: Asymmetric self-play for automatic goal discovery in robotic
manipulation
- Authors: OpenAI OpenAI, Matthias Plappert, Raul Sampedro, Tao Xu, Ilge Akkaya,
Vineet Kosaraju, Peter Welinder, Ruben D'Sa, Arthur Petron, Henrique P. d.O.
Pinto, Alex Paino, Hyeonwoo Noh, Lilian Weng, Qiming Yuan, Casey Chu,
Wojciech Zaremba
- Abstract summary: We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game.
We show that this method can discover highly diverse and complex goals without any human priors.
Our method scales, resulting in a single policy that can generalize to many unseen tasks.
- Score: 12.573331269520077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We train a single, goal-conditioned policy that can solve many robotic
manipulation tasks, including tasks with previously unseen goals and objects.
We rely on asymmetric self-play for goal discovery, where two agents, Alice and
Bob, play a game. Alice is asked to propose challenging goals and Bob aims to
solve them. We show that this method can discover highly diverse and complex
goals without any human priors. Bob can be trained with only sparse rewards,
because the interaction between Alice and Bob results in a natural curriculum
and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned
demonstration. Finally, our method scales, resulting in a single policy that
can generalize to many unseen tasks such as setting a table, stacking blocks,
and solving simple puzzles. Videos of a learned policy is available at
https://robotics-self-play.github.io.
Related papers
- An empirical study of task and feature correlations in the reuse of pre-trained models [1.0128808054306186]
Pre-trained neural networks are commonly used and reused in the machine learning community.<n>This paper introduces an experimental setup through which factors contributing to Bob's empirical success could be studied in silico.<n>We show in controlled real-world scenarios that Bob can effectively reuse Alice's pre-trained network if there are semantic correlations between his and Alice's task.
arXiv Detail & Related papers (2025-05-15T22:51:27Z) - Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation [17.222197596599685]
We propose a Skill Learning approach that discovers composable behaviors by solving a large number of autonomously generated tasks.
Our method learns skills allowing the robot to consistently and robustly interact with objects in its environment.
The learned skills can be used to solve a set of unseen manipulation tasks, in simulation as well as on a real robotic platform.
arXiv Detail & Related papers (2024-10-07T09:19:13Z) - Additive-Effect Assisted Learning [17.408937094829007]
We develop a two-stage assisted learning architecture for an agent, Alice, to seek assistance from another agent, Bob.
In the first stage, we propose a privacy-aware hypothesis testing-based screening method for Alice to decide on the usefulness of the data from Bob.
We show that Alice can achieve the oracle performance as if the training were from centralized data, both theoretically and numerically.
arXiv Detail & Related papers (2024-05-13T23:24:25Z) - Quantum advantage in a unified scenario and secure detection of
resources [55.2480439325792]
We consider a single task to study different approaches of having quantum advantage.
We show that the optimal success probability in the overall process for a qubit communication might be higher than that for a cbit communication.
arXiv Detail & Related papers (2023-09-22T23:06:20Z) - Offline Reinforcement Learning for Human-Guided Human-Machine
Interaction with Private Information [110.42866062614912]
We study human-guided human-machine interaction involving private information.
We focus on offline reinforcement learning (RL) in this game.
We develop a novel identification result and use it to propose a new off-policy evaluation method.
arXiv Detail & Related papers (2022-12-23T06:26:44Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Human-to-Robot Imitation in the Wild [50.49660984318492]
We propose an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective.
We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild.
arXiv Detail & Related papers (2022-07-19T17:59:59Z) - Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model.
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z) - COG: Connecting New Skills to Past Experience with Offline Reinforcement
Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming.
We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task.
We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z) - Follow the Object: Curriculum Learning for Manipulation Tasks with
Imagined Goals [8.98526174345299]
This paper introduces a notion of imaginary object goals.
For a given manipulation task, the object of interest is first trained to reach a desired target position on its own.
The object policy is then leveraged to build a predictive model of plausible object trajectories.
The proposed algorithm, Follow the Object, has been evaluated on 7 MuJoCo environments.
arXiv Detail & Related papers (2020-08-05T12:19:14Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.