Related papers: Value of Information-Enhanced Exploration in Bootstrapped DQN

Value of Information-Enhanced Exploration in Bootstrapped DQN

URL: http://arxiv.org/abs/2511.02969v1
Date: Tue, 04 Nov 2025 20:22:58 GMT
Title: Value of Information-Enhanced Exploration in Bootstrapped DQN
Authors: Stergios Plataniotis, Charilaos Akasiadis, Georgios Chalkiadakis,
Abstract summary: In this paper, we integrate the notion of (expected) value of information (EVOI) within the well-known Bootstrapped DQN algorithmic framework.<n>Specifically, we develop two novel algorithms that incorporate the expected gain from learning the value of information into Bootstrapped DQN.<n>Our experiments in complex, sparse-reward Atari games demonstrate increased performance, all the while making better use of uncertainty.
Score: 2.6173443955754903
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Efficient exploration in deep reinforcement learning remains a fundamental challenge, especially in environments characterized by high-dimensional states and sparse rewards. Traditional exploration strategies that rely on random local policy noise, such as $\epsilon$-greedy and Boltzmann exploration methods, often struggle to efficiently balance exploration and exploitation. In this paper, we integrate the notion of (expected) value of information (EVOI) within the well-known Bootstrapped DQN algorithmic framework, to enhance the algorithm's deep exploration ability. Specifically, we develop two novel algorithms that incorporate the expected gain from learning the value of information into Bootstrapped DQN. Our methods use value of information estimates to measure the discrepancies of opinions among distinct network heads, and drive exploration towards areas with the most potential. We evaluate our algorithms with respect to performance and their ability to exploit inherent uncertainty arising from random network initialization. Our experiments in complex, sparse-reward Atari games demonstrate increased performance, all the while making better use of uncertainty, and, importantly, without introducing extra hyperparameters.

Related papers

On Efficient Bayesian Exploration in Model-Based Reinforcement Learning [0.24578723416255752]
We address the challenge of data-efficient exploration in reinforcement learning by examining existing principled, information-theoretic approaches to intrinsic motivation.<n>We prove that exploration bonuses naturally signal epistemic information gains and converge to zero once the agent becomes sufficiently certain about the environment's dynamics and rewards.<n>We then outline a general framework - Predictive Trajectory Sampling with Bayesian Exploration (PTS-BE) - which integrates model-based planning with information-theoretic bonuses to achieve sample-efficient deep exploration.
arXiv Detail & Related papers (2025-07-03T14:03:47Z)
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.<n>We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.<n>We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Uncertainty quantification for deeponets with ensemble kalman inversion [0.8158530638728501]
In this work, we propose a novel inference approach for efficient uncertainty quantification (UQ) for operator learning by harnessing the power of the Ensemble Kalman Inversion (EKI) approach. EKI is known for its derivative-free, noise-robust, and highly parallelizable feature, and has demonstrated its advantages for UQ for physics-informed neural networks. We deploy a mini-batch variant of EKI to accommodate larger datasets, mitigating the computational demand due to large datasets during the training stage.
arXiv Detail & Related papers (2024-03-06T04:02:30Z)
Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z)
Exploration via Epistemic Value Estimation [22.54793586116019]
We propose a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.
arXiv Detail & Related papers (2023-03-07T16:25:52Z)
STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal. Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z)
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method. REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes. It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z)
k-Means Maximum Entropy Exploration [55.81894038654918]
Exploration in continuous spaces with sparse rewards is an open problem in reinforcement learning. We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution. We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces.
arXiv Detail & Related papers (2022-05-31T09:05:58Z)
Anti-Concentrated Confidence Bonuses for Scalable Exploration [57.91943847134011]
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off. We introduce emphanti-concentrated confidence bounds for efficiently approximating the elliptical bonus. We develop a practical variant for deep reinforcement learning that is competitive with contemporary intrinsic rewards on Atari benchmarks.
arXiv Detail & Related papers (2021-10-21T15:25:15Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
Langevin DQN [15.807243762876901]
We develop an incremental reinforcement learning algorithm that tracks a single point estimate. We demonstrate through a computational study that the presented algorithm achieves deep exploration. We also present a modification of the Langevin DQN algorithm to improve the computational efficiency.
arXiv Detail & Related papers (2020-02-17T22:29:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.