Exploration via Epistemic Value Estimation
- URL: http://arxiv.org/abs/2303.04012v1
- Date: Tue, 7 Mar 2023 16:25:52 GMT
- Title: Exploration via Epistemic Value Estimation
- Authors: Simon Schmitt, John Shawe-Taylor, Hado van Hasselt
- Abstract summary: We propose a recipe that is compatible with sequential decision making and with neural network function approximators.
It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently.
Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.
- Score: 22.54793586116019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to efficiently explore in reinforcement learning is an open problem. Many
exploration algorithms employ the epistemic uncertainty of their own value
predictions -- for instance to compute an exploration bonus or upper confidence
bound. Unfortunately the required uncertainty is difficult to estimate in
general with function approximation.
We propose epistemic value estimation (EVE): a recipe that is compatible with
sequential decision making and with neural network function approximators. It
equips agents with a tractable posterior over all their parameters from which
epistemic value uncertainty can be computed efficiently.
We use the recipe to derive an epistemic Q-Learning agent and observe
competitive performance on a series of benchmarks. Experiments confirm that the
EVE recipe facilitates efficient exploration in hard exploration tasks.
Related papers
- Automating reward function configuration for drug design [0.0]
We propose a novel approach for automated reward configuration that relies solely on experimental data.
We show that our algorithm yields reward functions that outperform predictive the accuracy of human-defined functions.
arXiv Detail & Related papers (2023-12-15T15:09:16Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - DUDES: Deep Uncertainty Distillation using Ensembles for Semantic
Segmentation [11.099838952805325]
Quantifying the predictive uncertainty is a promising endeavour to open up the use of deep neural networks for such applications.
We present a novel approach for efficient and reliable uncertainty estimation which we call Deep Uncertainty Distillation using Ensembles (DUDES)
DUDES applies student-teacher distillation with a Deep Ensemble to accurately approximate predictive uncertainties with a single forward pass.
arXiv Detail & Related papers (2023-03-17T08:56:27Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Explainable Deep Reinforcement Learning Using Introspection in a
Non-episodic Task [1.2735892003153293]
introspection-based method that transforms Q-values into probabilities of success used as base to explain agent's decision-making process.
We adapt the introspection method to be used in a non-episodic task and try it in a continuous Atari game scenario solved with the Rainbow algorithm.
arXiv Detail & Related papers (2021-08-18T02:49:49Z) - Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy.
In this paper, we highlight that value estimates are easily biased and temporally inconsistent.
We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Scalable Uncertainty for Computer Vision with Functional Variational
Inference [18.492485304537134]
We leverage the formulation of variational inference in function space.
We obtain predictive uncertainty estimates at the cost of a single forward pass through any chosen CNN architecture.
We propose numerically efficient algorithms which enable fast training in the context of high-dimensional tasks.
arXiv Detail & Related papers (2020-03-06T19:09:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.