Exploration via Elliptical Episodic Bonuses
- URL: http://arxiv.org/abs/2210.05805v1
- Date: Tue, 11 Oct 2022 22:10:23 GMT
- Title: Exploration via Elliptical Episodic Bonuses
- Authors: Mikael Henaff, Roberta Raileanu, Minqi Jiang, Tim Rockt\"aschel
- Abstract summary: We introduce Exploration via Episodic Bonuses (E3B), a new method which extends count-based episodic bonuses to continuous state spaces.
Our method sets a new state-of-the-art across 16 challenging tasks from the MiniHack suite, without requiring task-specific inductive biases.
E3B also matches existing methods on sparse reward, pixel-based VizDoom environments, and outperforms existing methods in reward-free exploration on Habitat.
- Score: 22.404871878551354
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, a number of reinforcement learning (RL) methods have been
proposed to explore complex environments which differ across episodes. In this
work, we show that the effectiveness of these methods critically relies on a
count-based episodic term in their exploration bonus. As a result, despite
their success in relatively simple, noise-free settings, these methods fall
short in more realistic scenarios where the state space is vast and prone to
noise. To address this limitation, we introduce Exploration via Elliptical
Episodic Bonuses (E3B), a new method which extends count-based episodic bonuses
to continuous state spaces and encourages an agent to explore states that are
diverse under a learned embedding within each episode. The embedding is learned
using an inverse dynamics model in order to capture controllable aspects of the
environment. Our method sets a new state-of-the-art across 16 challenging tasks
from the MiniHack suite, without requiring task-specific inductive biases. E3B
also matches existing methods on sparse reward, pixel-based VizDoom
environments, and outperforms existing methods in reward-free exploration on
Habitat, demonstrating that it can scale to high-dimensional pixel-based
observations and realistic environments.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - Rank the Episodes: A Simple Approach for Exploration in
Procedurally-Generated Environments [66.80667987347151]
Methods based on intrinsic rewards often fall short in procedurally-generated environments.
We introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments.
We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks.
arXiv Detail & Related papers (2021-01-20T14:22:01Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z) - Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z) - Novelty Search in Representational Space for Sample Efficient
Exploration [38.2027946450689]
We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives.
Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty.
We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards.
arXiv Detail & Related papers (2020-09-28T18:51:52Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.