Related papers: How to Enable Uncertainty Estimation in Proximal Policy Optimization

How to Enable Uncertainty Estimation in Proximal Policy Optimization

URL: http://arxiv.org/abs/2210.03649v1
Date: Fri, 7 Oct 2022 15:56:59 GMT
Title: How to Enable Uncertainty Estimation in Proximal Policy Optimization
Authors: Eugene Bykovets, Yannick Metz, Mennatallah El-Assady, Daniel A. Keim, Joachim M. Buhmann
Abstract summary: Existing uncertainty estimation techniques have not seen widespread adoption in on-policy deep RL. We propose definitions of uncertainty and OOD for Actor-Critic RL algorithms. We show experimentally that the recently proposed method of Masksembles strikes a favourable balance among the survey methods.
Score: 20.468991996052953
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ensembles, have not seen widespread adoption in on-policy deep RL. We posit that this is due to two reasons: concepts like uncertainty and OOD states are not well defined compared to supervised learning, especially for on-policy RL methods. Secondly, available implementations and comparative studies for uncertainty estimation methods in RL have been limited. To overcome the first gap, we propose definitions of uncertainty and OOD for Actor-Critic RL algorithms, namely, proximal policy optimization (PPO), and present possible applicable measures. In particular, we discuss the concepts of value and policy uncertainty. The second point is addressed by implementing different uncertainty estimation methods and comparing them across a number of environments. The OOD detection performance is evaluated via a custom evaluation benchmark of in-distribution (ID) and OOD states for various RL environments. We identify a trade-off between reward and OOD detection performance. To overcome this, we formulate a Pareto optimization problem in which we simultaneously optimize for reward and OOD detection performance. We show experimentally that the recently proposed method of Masksembles strikes a favourable balance among the survey methods, enabling high-quality uncertainty estimation and OOD detection while matching the performance of original RL agents.

Related papers

Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation [2.0836728378106883]
Training environments may not reflect real-life environments. Training systems are often equipped with out-of-distribution detectors that alert when a trained system encounters a state it does not recognize or in which it exhibits uncertainty.
arXiv Detail & Related papers (2025-03-07T08:40:41Z)
The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection [75.65876949930258]
Out-of-distribution (OOD) detection is essential for model trustworthiness. We show that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability.
arXiv Detail & Related papers (2024-10-12T07:02:04Z)
Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks [17.520137576423593]
We aim to provide a consolidated view of the two largest sub-fields within the community: out-of-distribution (OOD) detection and open-set recognition (OSR) We perform rigorous cross-evaluation between state-of-the-art methods in the OOD detection and OSR settings and identify a strong correlation between the performances of methods for them. We propose a new, large-scale benchmark setting which we suggest better disentangles the problem tackled by OOD detection and OSR.
arXiv Detail & Related papers (2024-08-29T17:55:07Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Adaptive Uncertainty Estimation via High-Dimensional Testing on Latent Representations [28.875819909902244]
Uncertainty estimation aims to evaluate the confidence of a trained deep neural network. Existing uncertainty estimation approaches rely on low-dimensional distributional assumptions. We propose a new framework using data-adaptive high-dimensional hypothesis testing for uncertainty estimation.
arXiv Detail & Related papers (2023-10-25T12:22:18Z)
Beyond AUROC & co. for evaluating out-of-distribution detection performance [50.88341818412508]
Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs. We propose a new metric - Area Under the Threshold Curve (AUTC), which explicitly penalizes poor separation between ID and OOD samples.
arXiv Detail & Related papers (2023-06-26T12:51:32Z)
Improving Out-of-Distribution Detection via Epistemic Uncertainty Adversarial Training [29.4569172720654]
We develop a simple adversarial training scheme that incorporates an attack of the uncertainty predicted by the dropout ensemble. We demonstrate this method improves OOD detection performance on standard data (i.e., not adversarially crafted), and improves the standardized partial AUC from near-random guessing performance to $geq 0.75$.
arXiv Detail & Related papers (2022-09-05T14:32:19Z)
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning [125.8224674893018]
Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment. Applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions. We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
arXiv Detail & Related papers (2022-02-23T15:27:16Z)
Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation [12.415463205960156]
In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency. We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL. We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environmentity to better mitigate the negative impacts of noisy supervision.
arXiv Detail & Related papers (2022-01-05T15:46:06Z)
On the Practicality of Deterministic Epistemic Uncertainty [106.06571981780591]
deterministic uncertainty methods (DUMs) achieve strong performance on detecting out-of-distribution data. It remains unclear whether DUMs are well calibrated and can seamlessly scale to real-world applications.
arXiv Detail & Related papers (2021-07-01T17:59:07Z)
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z)
Uncertainty-Based Out-of-Distribution Classification in Deep Reinforcement Learning [17.10036674236381]
Wrong predictions for out-of-distribution data can cause safety critical situations in machine learning systems. We propose a framework for uncertainty-based OOD classification: UBOOD. We show that UBOOD produces reliable classification results when combined with ensemble-based estimators.
arXiv Detail & Related papers (2019-12-31T09:52:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.