R\'enyi State Entropy for Exploration Acceleration in Reinforcement
Learning
- URL: http://arxiv.org/abs/2203.04297v1
- Date: Tue, 8 Mar 2022 07:38:35 GMT
- Title: R\'enyi State Entropy for Exploration Acceleration in Reinforcement
Learning
- Authors: Mingqi Yuan, Man-on Pun, Dong Wang
- Abstract summary: In this work, a novel intrinsic reward module based on the R'enyi entropy is proposed to provide high-quality intrinsic rewards.
In particular, a $k$-nearest neighbor is introduced for entropy estimation while a $k$-value search method is designed to guarantee the estimation accuracy.
- Score: 6.72733760405596
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: One of the most critical challenges in deep reinforcement learning is to
maintain the long-term exploration capability of the agent. To tackle this
problem, it has been recently proposed to provide intrinsic rewards for the
agent to encourage exploration. However, most existing intrinsic reward-based
methods proposed in the literature fail to provide sustainable exploration
incentives, a problem known as vanishing rewards. In addition, these
conventional methods incur complex models and additional memory in their
learning procedures, resulting in high computational complexity and low
robustness. In this work, a novel intrinsic reward module based on the R\'enyi
entropy is proposed to provide high-quality intrinsic rewards. It is shown that
the proposed method actually generalizes the existing state entropy
maximization methods. In particular, a $k$-nearest neighbor estimator is
introduced for entropy estimation while a $k$-value search method is designed
to guarantee the estimation accuracy. Extensive simulation results demonstrate
that the proposed R\'enyi entropy-based method can achieve higher performance
as compared to existing schemes.
Related papers
- Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - k-Means Maximum Entropy Exploration [55.81894038654918]
Exploration in continuous spaces with sparse rewards is an open problem in reinforcement learning.
We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution.
We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces.
arXiv Detail & Related papers (2022-05-31T09:05:58Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Exploration in Deep Reinforcement Learning: A Survey [4.066140143829243]
Exploration techniques are of primary importance when solving sparse reward problems.
In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly.
This review provides a comprehensive overview of existing exploration approaches.
arXiv Detail & Related papers (2022-05-02T12:03:44Z) - Learning Long-Term Reward Redistribution via Randomized Return
Decomposition [18.47810850195995]
We consider the problem formulation of episodic reinforcement learning with trajectory feedback.
It refers to an extreme delay of reward signals, in which the agent can only obtain one reward signal at the end of each trajectory.
We propose a novel reward redistribution algorithm, randomized return decomposition (RRD), to learn a proxy reward function for episodic reinforcement learning.
arXiv Detail & Related papers (2021-11-26T13:23:36Z) - Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off.
We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus.
We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Multimodal Reward Shaping for Efficient Exploration in Reinforcement
Learning [8.810296389358134]
IRS modules rely on attendant models or additional memory to record and analyze learning procedures.
We introduce a novel metric entitled Jain's fairness index (JFI) to replace the entropy regularizer.
arXiv Detail & Related papers (2021-07-19T14:04:32Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.