Planning to Explore via Self-Supervised World Models
- URL: http://arxiv.org/abs/2005.05960v2
- Date: Tue, 30 Jun 2020 23:05:50 GMT
- Title: Planning to Explore via Self-Supervised World Models
- Authors: Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar
Hafner, Deepak Pathak
- Abstract summary: Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
- Score: 120.31359262226758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning allows solving complex tasks, however, the learning
tends to be task-specific and the sample efficiency remains a challenge. We
present Plan2Explore, a self-supervised reinforcement learning agent that
tackles both these challenges through a new approach to self-supervised
exploration and fast adaptation to new tasks, which need not be known during
exploration. During exploration, unlike prior methods which retrospectively
compute the novelty of observations after the agent has already reached them,
our agent acts efficiently by leveraging planning to seek out expected future
novelty. After exploration, the agent quickly adapts to multiple downstream
tasks in a zero or a few-shot manner. We evaluate on challenging control tasks
from high-dimensional image inputs. Without any training supervision or
task-specific interaction, Plan2Explore outperforms prior self-supervised
exploration methods, and in fact, almost matches the performances oracle which
has access to rewards. Videos and code at
https://ramanans1.github.io/plan2explore/
Related papers
- Training on more Reachable Tasks for Generalisation in Reinforcement Learning [5.855552389030083]
In multi-task reinforcement learning, agents train on a fixed set of tasks and have to generalise to new ones.
Recent work has shown that increased exploration improves this generalisation, but it remains unclear why exactly that is.
We introduce the concept of reachability in multi-task reinforcement learning and show that an initial exploration phase increases the number of reachable tasks the agent is trained on.
arXiv Detail & Related papers (2024-10-04T16:15:31Z) - Generalizing to New Tasks via One-Shot Compositional Subgoals [23.15624959305799]
The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research.
We introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals.
Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.
arXiv Detail & Related papers (2022-05-16T14:30:11Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Latent Skill Planning for Exploration and Transfer [49.25525932162891]
In this paper, we investigate how these two approaches can be integrated into a single reinforcement learning agent.
We leverage the idea of partial amortization for fast adaptation at test time.
We demonstrate the benefits of our design decisions across a suite of challenging locomotion tasks.
arXiv Detail & Related papers (2020-11-27T18:40:03Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - ACDER: Augmented Curiosity-Driven Experience Replay [16.755555854030412]
We propose a novel method called Augmented Curiosity-Driven Experience Replay (ACDER)
ACDER uses a new goal-oriented curiosity-driven exploration to encourage the agent to pursue novel and task-relevant states more purposefully.
Experiments conducted on four challenging robotic manipulation tasks with binary rewards, including Reach, Push, Pick&Place and Multi-step Push.
arXiv Detail & Related papers (2020-11-16T15:27:15Z) - Continual Learning of Control Primitives: Skill Discovery via
Reset-Games [128.36174682118488]
We show how a single method can allow an agent to acquire skills with minimal supervision.
We do this by exploiting the insight that the need to "reset" an agent to a broad set of initial states for a learning task provides a natural setting to learn a diverse set of "reset-skills"
arXiv Detail & Related papers (2020-11-10T18:07:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.