Interesting Object, Curious Agent: Learning Task-Agnostic Exploration
- URL: http://arxiv.org/abs/2111.13119v1
- Date: Thu, 25 Nov 2021 15:17:32 GMT
- Title: Interesting Object, Curious Agent: Learning Task-Agnostic Exploration
- Authors: Simone Parisi, Victoria Dean, Deepak Pathak, Abhinav Gupta
- Abstract summary: In this paper, we propose a paradigm change in the formulation and evaluation of task-agnostic exploration.
We show that our formulation is effective and provides the most consistent exploration across several training-testing environment pairs.
- Score: 44.18450799034677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Common approaches for task-agnostic exploration learn tabula-rasa --the agent
assumes isolated environments and no prior knowledge or experience. However, in
the real world, agents learn in many environments and always come with prior
experiences as they explore new ones. Exploration is a lifelong process. In
this paper, we propose a paradigm change in the formulation and evaluation of
task-agnostic exploration. In this setup, the agent first learns to explore
across many environments without any extrinsic goal in a task-agnostic manner.
Later on, the agent effectively transfers the learned exploration policy to
better explore new environments when solving tasks. In this context, we
evaluate several baseline exploration strategies and present a simple yet
effective approach to learning task-agnostic exploration policies. Our key idea
is that there are two components of exploration: (1) an agent-centric component
encouraging exploration of unseen parts of the environment based on an agent's
belief; (2) an environment-centric component encouraging exploration of
inherently interesting objects. We show that our formulation is effective and
provides the most consistent exploration across several training-testing
environment pairs. We also introduce benchmarks and metrics for evaluating
task-agnostic exploration strategies. The source code is available at
https://github.com/sparisi/cbet/.
Related papers
- On the Importance of Exploration for Generalization in Reinforcement
Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z) - Follow your Nose: Using General Value Functions for Directed Exploration
in Reinforcement Learning [5.40729975786985]
This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy.
We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.
arXiv Detail & Related papers (2022-03-02T05:14:11Z) - Agent Spaces [0.0]
We define exploration as the act of modifying an agent to itself be explorative.
We show that many important structures in Reinforcement Learning are well behaved under the topology induced by convergence in the agent space.
arXiv Detail & Related papers (2021-11-11T01:12:17Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Exploration in Deep Reinforcement Learning: A Comprehensive Survey [24.252352133705735]
Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant success across a wide range of domains, such as game AI, autonomous vehicles, robotics and finance.
DRL and deep MARL agents are widely known to be sample-inefficient and millions of interactions are usually needed even for relatively simple game settings.
This paper provides a comprehensive survey on existing exploration methods in DRL and deep MARL.
arXiv Detail & Related papers (2021-09-14T13:16:33Z) - Deep Reinforcement Learning for Adaptive Exploration of Unknown
Environments [6.90777229452271]
We develop an adaptive exploration approach to trade off between exploration and exploitation in one single step for UAVs.
The proposed approach uses a map segmentation technique to decompose the environment map into smaller, tractable maps.
The results demonstrate that our proposed approach is capable of navigating through randomly generated environments and covering more AoI in less time steps compared to the baselines.
arXiv Detail & Related papers (2021-05-04T16:29:44Z) - Latent Skill Planning for Exploration and Transfer [49.25525932162891]
In this paper, we investigate how these two approaches can be integrated into a single reinforcement learning agent.
We leverage the idea of partial amortization for fast adaptation at test time.
We demonstrate the benefits of our design decisions across a suite of challenging locomotion tasks.
arXiv Detail & Related papers (2020-11-27T18:40:03Z) - Semantic Curiosity for Active Visual Learning [45.75355448193764]
We study the task of embodied interactive learning for object detection.
Our goal is to learn an object detector by having an agent select what data to obtain labels for.
arXiv Detail & Related papers (2020-06-16T17:59:24Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.