Related papers: Exploration via Epistemic Value Estimation

Exploration via Epistemic Value Estimation

URL: http://arxiv.org/abs/2303.04012v1
Date: Tue, 7 Mar 2023 16:25:52 GMT
Title: Exploration via Epistemic Value Estimation
Authors: Simon Schmitt, John Shawe-Taylor, Hado van Hasselt
Abstract summary: We propose a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.
Score: 22.54793586116019
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions -- for instance to compute an exploration bonus or upper confidence bound. Unfortunately the required uncertainty is difficult to estimate in general with function approximation. We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. We use the recipe to derive an epistemic Q-Learning agent and observe competitive performance on a series of benchmarks. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.

Related papers

Automating reward function configuration for drug design [0.0]
We propose a novel approach for automated reward configuration that relies solely on experimental data. We show that our algorithm yields reward functions that outperform predictive the accuracy of human-defined functions.
arXiv Detail & Related papers (2023-12-15T15:09:16Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
DUDES: Deep Uncertainty Distillation using Ensembles for Semantic Segmentation [11.099838952805325]
Quantifying the predictive uncertainty is a promising endeavour to open up the use of deep neural networks for such applications. We present a novel approach for efficient and reliable uncertainty estimation which we call Deep Uncertainty Distillation using Ensembles (DUDES) DUDES applies student-teacher distillation with a Deep Ensemble to accurately approximate predictive uncertainties with a single forward pass.
arXiv Detail & Related papers (2023-03-17T08:56:27Z)
Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z)
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method. REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes. It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
Explainable Deep Reinforcement Learning Using Introspection in a Non-episodic Task [1.2735892003153293]
introspection-based method that transforms Q-values into probabilities of success used as base to explain agent's decision-making process. We adapt the introspection method to be used in a non-episodic task and try it in a continuous Atari game scenario solved with the Rainbow algorithm.
arXiv Detail & Related papers (2021-08-18T02:49:49Z)
Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy. In this paper, we highlight that value estimates are easily biased and temporally inconsistent. We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Scalable Uncertainty for Computer Vision with Functional Variational Inference [18.492485304537134]
We leverage the formulation of variational inference in function space. We obtain predictive uncertainty estimates at the cost of a single forward pass through any chosen CNN architecture. We propose numerically efficient algorithms which enable fast training in the context of high-dimensional tasks.
arXiv Detail & Related papers (2020-03-06T19:09:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.