Exploration in Deep Reinforcement Learning: A Survey
- URL: http://arxiv.org/abs/2205.00824v1
- Date: Mon, 2 May 2022 12:03:44 GMT
- Title: Exploration in Deep Reinforcement Learning: A Survey
- Authors: Pawel Ladosz, Lilian Weng, Minwoo Kim, Hyondong Oh
- Abstract summary: Exploration techniques are of primary importance when solving sparse reward problems.
In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly.
This review provides a comprehensive overview of existing exploration approaches.
- Score: 4.066140143829243
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper reviews exploration techniques in deep reinforcement learning.
Exploration techniques are of primary importance when solving sparse reward
problems. In sparse reward problems, the reward is rare, which means that the
agent will not find the reward often by acting randomly. In such a scenario, it
is challenging for reinforcement learning to learn rewards and actions
association. Thus more sophisticated exploration methods need to be devised.
This review provides a comprehensive overview of existing exploration
approaches, which are categorized based on the key contributions as follows
reward novel states, reward diverse behaviours, goal-based methods,
probabilistic methods, imitation-based methods, safe exploration and
random-based methods. Then, the unsolved challenges are discussed to provide
valuable future research directions. Finally, the approaches of different
categories are compared in terms of complexity, computational effort and
overall performance.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - GAN-based Intrinsic Exploration For Sample Efficient Reinforcement
Learning [0.0]
We propose a Geneversarative Adversarial Network-based Intrinsic Reward Module that learns the distribution of the observed states and sends an intrinsic reward that is computed as high for states that are out of distribution.
We evaluate our approach in Super Mario Bros for a no reward setting and in Montezuma's Revenge for a sparse reward setting and show that our approach is indeed capable of exploring efficiently.
arXiv Detail & Related papers (2022-06-28T19:16:52Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - R\'enyi State Entropy for Exploration Acceleration in Reinforcement
Learning [6.72733760405596]
In this work, a novel intrinsic reward module based on the R'enyi entropy is proposed to provide high-quality intrinsic rewards.
In particular, a $k$-nearest neighbor is introduced for entropy estimation while a $k$-value search method is designed to guarantee the estimation accuracy.
arXiv Detail & Related papers (2022-03-08T07:38:35Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.