Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning
- URL: http://arxiv.org/abs/2209.08842v1
- Date: Mon, 19 Sep 2022 08:42:46 GMT
- Title: Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning
- Authors: Mingqi Yuan, Bo Li, Xin Jin, Wenjun Zeng
- Abstract summary: We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
- Score: 64.8463574294237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploration is critical for deep reinforcement learning in complex
environments with high-dimensional observations and sparse rewards. To address
this problem, recent approaches proposed to leverage intrinsic rewards to
improve exploration, such as novelty-based exploration and prediction-based
exploration. However, many intrinsic reward modules require sophisticated
structures and representation learning, resulting in prohibitive computational
complexity and unstable performance. In this paper, we propose Rewarding
Episodic Visitation Discrepancy (REVD), a computation-efficient and quantified
exploration method. More specifically, REVD provides intrinsic rewards by
evaluating the R\'enyi divergence-based visitation discrepancy between
episodes. To make efficient divergence estimation, a k-nearest neighbor
estimator is utilized with a randomly-initialized state encoder. Finally, the
REVD is tested on PyBullet Robotics Environments and Atari games. Extensive
experiments demonstrate that REVD can significantly improves the sample
efficiency of reinforcement learning algorithms and outperforms the
benchmarking methods.
Related papers
- Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - DEIR: Efficient and Robust Exploration through
Discriminative-Model-Based Episodic Intrinsic Rewards [2.09711130126031]
Exploration is a fundamental aspect of reinforcement learning (RL), and its effectiveness is a deciding factor in the performance of RL algorithms.
Recent studies have shown the effectiveness of encouraging exploration with intrinsic rewards estimated from novelties in observations.
We propose DEIR, a novel method in which we theoretically derive an intrinsic reward with a conditional mutual information term.
arXiv Detail & Related papers (2023-04-21T06:39:38Z) - SVDE: Scalable Value-Decomposition Exploration for Cooperative
Multi-Agent Reinforcement Learning [22.389803019100423]
We propose a scalable value-decomposition exploration (SVDE) method, which includes a scalable training mechanism, intrinsic reward design, and explorative experience replay.
Our method achieves the best performance on almost all maps compared to other popular algorithms in a set of StarCraft II micromanagement games.
arXiv Detail & Related papers (2023-03-16T03:17:20Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - k-Means Maximum Entropy Exploration [55.81894038654918]
Exploration in continuous spaces with sparse rewards is an open problem in reinforcement learning.
We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution.
We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces.
arXiv Detail & Related papers (2022-05-31T09:05:58Z) - R\'enyi State Entropy for Exploration Acceleration in Reinforcement
Learning [6.72733760405596]
In this work, a novel intrinsic reward module based on the R'enyi entropy is proposed to provide high-quality intrinsic rewards.
In particular, a $k$-nearest neighbor is introduced for entropy estimation while a $k$-value search method is designed to guarantee the estimation accuracy.
arXiv Detail & Related papers (2022-03-08T07:38:35Z) - Multimodal Reward Shaping for Efficient Exploration in Reinforcement
Learning [8.810296389358134]
IRS modules rely on attendant models or additional memory to record and analyze learning procedures.
We introduce a novel metric entitled Jain's fairness index (JFI) to replace the entropy regularizer.
arXiv Detail & Related papers (2021-07-19T14:04:32Z) - Learning from an Exploring Demonstrator: Optimal Reward Estimation for
Bandits [36.37578212532926]
We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance.
Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy.
We develop simple and efficient reward estimation procedures for demonstrations within a class of upper-confidence-based algorithms.
arXiv Detail & Related papers (2021-06-28T17:37:49Z) - MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards.
We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions.
Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z) - State Entropy Maximization with Random Encoders for Efficient
Exploration [162.39202927681484]
Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL)
This paper presents Randoms for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward.
In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly encoder.
arXiv Detail & Related papers (2021-02-18T15:45:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.