Related papers: Exploration in Deep Reinforcement Learning: A Survey

Exploration in Deep Reinforcement Learning: A Survey

URL: http://arxiv.org/abs/2205.00824v1
Date: Mon, 2 May 2022 12:03:44 GMT
Title: Exploration in Deep Reinforcement Learning: A Survey
Authors: Pawel Ladosz, Lilian Weng, Minwoo Kim, Hyondong Oh
Abstract summary: Exploration techniques are of primary importance when solving sparse reward problems. In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly. This review provides a comprehensive overview of existing exploration approaches.
Score: 4.066140143829243
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper reviews exploration techniques in deep reinforcement learning. Exploration techniques are of primary importance when solving sparse reward problems. In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly. In such a scenario, it is challenging for reinforcement learning to learn rewards and actions association. Thus more sophisticated exploration methods need to be devised. This review provides a comprehensive overview of existing exploration approaches, which are categorized based on the key contributions as follows reward novel states, reward diverse behaviours, goal-based methods, probabilistic methods, imitation-based methods, safe exploration and random-based methods. Then, the unsolved challenges are discussed to provide valuable future research directions. Finally, the approaches of different categories are compared in terms of complexity, computational effort and overall performance.

Related papers

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE) RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies. We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method. REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes. It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z)
GAN-based Intrinsic Exploration For Sample Efficient Reinforcement Learning [0.0]
We propose a Geneversarative Adversarial Network-based Intrinsic Reward Module that learns the distribution of the observed states and sends an intrinsic reward that is computed as high for states that are out of distribution. We evaluate our approach in Super Mario Bros for a no reward setting and in Montezuma's Revenge for a sparse reward setting and show that our approach is indeed capable of exploring efficiently.
arXiv Detail & Related papers (2022-06-28T19:16:52Z)
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms. Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward. Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z)
R\'enyi State Entropy for Exploration Acceleration in Reinforcement Learning [6.72733760405596]
In this work, a novel intrinsic reward module based on the R'enyi entropy is proposed to provide high-quality intrinsic rewards. In particular, a $k$-nearest neighbor is introduced for entropy estimation while a $k$-value search method is designed to guarantee the estimation accuracy.
arXiv Detail & Related papers (2022-03-08T07:38:35Z)
Learning in Sparse Rewards settings through Quality-Diversity algorithms [1.4881159885040784]
This thesis focuses on the problem of sparse rewards with Quality-Diversity (QD) algorithms. The first part of the thesis focuses on learning a representation of the space in which the diversity of the policies is evaluated. The thesis continues with the introduction of the SERENE algorithm, a method that can efficiently focus on the interesting parts of the search space.
arXiv Detail & Related papers (2022-03-02T11:02:34Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR) The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment. The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z)
Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed. We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z)
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation. We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z)
Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge. We present a novel approach that plans exploration actions far into the future by using a long-term visitation count. Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.