Related papers: Exploration and Anti-Exploration with Distributional Random Network Distillation

Exploration and Anti-Exploration with Distributional Random Network Distillation

URL: http://arxiv.org/abs/2401.09750v4
Date: Mon, 20 May 2024 02:12:21 GMT
Title: Exploration and Anti-Exploration with Distributional Random Network Distillation
Authors: Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li,
Abstract summary: This paper highlights the "bonus inconsistency" issue within the Random Network Distillation (RND) algorithm. To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND. Our method effectively mitigates the inconsistency issue without introducing significant computational overhead.
Score: 28.68459770494451
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the "bonus inconsistency" issue within RND, pinpointing its primary limitation. To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND. DRND enhances the exploration process by distilling a distribution of random networks and implicitly incorporating pseudo counts to improve the precision of bonus allocation. This refinement encourages agents to engage in more extensive exploration. Our method effectively mitigates the inconsistency issue without introducing significant computational overhead. Both theoretical analysis and experimental results demonstrate the superiority of our approach over the original RND algorithm. Our method excels in challenging online exploration scenarios and effectively serves as an anti-exploration mechanism in D4RL offline tasks. Our code is publicly available at https://github.com/yk7333/DRND.

Related papers

Exploration by Random Distribution Distillation [28.675586715243437]
We propose a novel method called textbfRandom textbfDistribution textbfDistillation (RDD)<n>RDD samples the output of a target network from a normal distribution.<n>We demonstrate that RDD effectively unifies both count-based and prediction-error approaches.
arXiv Detail & Related papers (2025-05-16T09:38:21Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE) RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies. We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Neural Exploitation and Exploration of Contextual Bandits [51.25537742455235]
We study utilizing neural networks for the exploitation and exploration of contextual multi-armed bandits. EE-Net is a novel neural-based exploitation and exploration strategy. We show that EE-Net outperforms related linear and neural contextual bandit baselines on real-world datasets.
arXiv Detail & Related papers (2023-05-05T18:34:49Z)
Anti-Exploration by Random Network Distillation [63.04360288089277]
We show that a naive choice of conditioning for the Random Network Distillation (RND) is not discriminative enough to be used as an uncertainty estimator. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM) We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.
arXiv Detail & Related papers (2023-01-31T13:18:33Z)
Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off. We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
ADER:Adapting between Exploration and Robustness for Actor-Critic Methods [8.750251598581102]
We show that TD3's performance lags behind the vanilla actor-critic methods in some primitive environments. We propose a novel algorithm toward this problem that ADapts between Exploration and Robustness, namely ADER. Experiments in several challenging environments demonstrate the supremacy of the proposed method in continuous control tasks.
arXiv Detail & Related papers (2021-09-08T05:48:39Z)
MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards. We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions. Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.