Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation
- URL: http://arxiv.org/abs/2201.01666v1
- Date: Wed, 5 Jan 2022 15:46:06 GMT
- Title: Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation
- Authors: Vincent Mai, Kaustubh Mani and Liam Paull
- Abstract summary: In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency.
We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL.
We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environmentity to better mitigate the negative impacts of noisy supervision.
- Score: 12.415463205960156
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In model-free deep reinforcement learning (RL) algorithms, using noisy value
estimates to supervise policy evaluation and optimization is detrimental to the
sample efficiency. As this noise is heteroscedastic, its effects can be
mitigated using uncertainty-based weights in the optimization process. Previous
methods rely on sampled ensembles, which do not capture all aspects of
uncertainty. We provide a systematic analysis of the sources of uncertainty in
the noisy supervision that occurs in RL, and introduce inverse-variance RL, a
Bayesian framework which combines probabilistic ensembles and Batch Inverse
Variance weighting. We propose a method whereby two complementary uncertainty
estimation methods account for both the Q-value and the environment
stochasticity to better mitigate the negative impacts of noisy supervision. Our
results show significant improvement in terms of sample efficiency on discrete
and continuous control tasks.
Related papers
- Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Towards Reliable Uncertainty Quantification via Deep Ensembles in
Multi-output Regression Task [0.0]
This study aims to investigate the deep ensemble approach, an approximate Bayesian inference, in the multi-output regression task.
A trend towards underestimation of uncertainty as it increases is observed for the first time.
We propose the deep ensemble framework that applies the post-hoc calibration method to improve its uncertainty quantification performance.
arXiv Detail & Related papers (2023-03-28T05:10:57Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - Uncertainty Quantification for Traffic Forecasting: A Unified Approach [21.556559649467328]
Uncertainty is an essential consideration for time series forecasting tasks.
In this work, we focus on quantifying the uncertainty of traffic forecasting.
We develop Deep S-Temporal Uncertainty Quantification (STUQ), which can estimate both aleatoric and relational uncertainty.
arXiv Detail & Related papers (2022-08-11T15:21:53Z) - Stochastic optimal well control in subsurface reservoirs using
reinforcement learning [0.0]
We present a case study of model-free reinforcement learning framework to solve optimal control for a predefined parameter uncertainty distribution.
In principle, RL algorithms are capable of learning optimal action policies to maximize a numerical reward signal.
We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C) on two subsurface flow test cases.
arXiv Detail & Related papers (2022-07-07T17:34:23Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Exploring the Training Robustness of Distributional Reinforcement
Learning against Noisy State Observations [7.776010676090131]
State observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.
In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return.
arXiv Detail & Related papers (2021-09-17T22:37:39Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design.
We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z) - The Aleatoric Uncertainty Estimation Using a Separate Formulation with
Virtual Residuals [51.71066839337174]
Existing methods can quantify the error in the target estimation, but they tend to underestimate it.
We propose a new separable formulation for the estimation of a signal and of its uncertainty, avoiding the effect of overfitting.
We demonstrate that the proposed method outperforms a state-of-the-art technique for signal and uncertainty estimation.
arXiv Detail & Related papers (2020-11-03T12:11:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.