Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous
Control
- URL: http://arxiv.org/abs/2207.13730v1
- Date: Wed, 27 Jul 2022 18:11:04 GMT
- Title: Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous
Control
- Authors: Takuya Kanazawa, Haiyan Wang, Chetan Gupta
- Abstract summary: Uncertainty quantification is one of the central challenges for machine learning in real-world applications.
Disentangling and evaluating uncertainties simultaneously stands a chance of improving the agent's final performance.
We propose an uncertainty-aware reinforcement learning algorithm for continuous control tasks.
- Score: 13.767812547998735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Uncertainty quantification is one of the central challenges for machine
learning in real-world applications. In reinforcement learning, an agent
confronts two kinds of uncertainty, called epistemic uncertainty and aleatoric
uncertainty. Disentangling and evaluating these uncertainties simultaneously
stands a chance of improving the agent's final performance, accelerating
training, and facilitating quality assurance after deployment. In this work, we
propose an uncertainty-aware reinforcement learning algorithm for continuous
control tasks that extends the Deep Deterministic Policy Gradient algorithm
(DDPG). It exploits epistemic uncertainty to accelerate exploration and
aleatoric uncertainty to learn a risk-sensitive policy. We conduct numerical
experiments showing that our variant of DDPG outperforms vanilla DDPG without
uncertainty estimation in benchmark tasks on robotic control and power-grid
optimization.
Related papers
- Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework [54.40508478482667]
We present a comprehensive framework to disentangle, quantify, and mitigate uncertainty in perception and plan generation.
We propose methods tailored to the unique properties of perception and decision-making.
We show that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines.
arXiv Detail & Related papers (2024-11-03T17:32:00Z) - Predicting Safety Misbehaviours in Autonomous Driving Systems using Uncertainty Quantification [8.213390074932132]
This paper evaluates different uncertainty quantification methods from the deep learning domain for the anticipatory testing of safety-critical misbehaviours.
We compute uncertainty scores as the vehicle executes, following the intuition that high uncertainty scores are indicative of unsupported runtime conditions.
In our study, we conducted an evaluation of the effectiveness and computational overhead associated with two uncertainty quantification methods, namely MC- Dropout and Deep Ensembles, for misbehaviour avoidance.
arXiv Detail & Related papers (2024-04-29T10:28:28Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Wasserstein Actor-Critic: Directed Exploration via Optimism for
Continuous-Actions Control [41.7453231409493]
Wasserstein Actor-Critic ( WAC) is an actor-critic architecture inspired by the Wasserstein Q-Learning (WQL) citepwql.
WAC enforces exploration in a principled way by guiding the policy learning process with the optimization of an upper bound of the Q-value estimates.
arXiv Detail & Related papers (2023-03-04T10:52:20Z) - Bayesian autoencoders with uncertainty quantification: Towards
trustworthy anomaly detection [78.24964622317634]
In this work, the formulation of Bayesian autoencoders (BAEs) is adopted to quantify the total anomaly uncertainty.
To evaluate the quality of uncertainty, we consider the task of classifying anomalies with the additional option of rejecting predictions of high uncertainty.
Our experiments demonstrate the effectiveness of the BAE and total anomaly uncertainty on a set of benchmark datasets and two real datasets for manufacturing.
arXiv Detail & Related papers (2022-02-25T12:20:04Z) - Risk Sensitive Model-Based Reinforcement Learning using Uncertainty
Guided Planning [0.0]
In this paper, risk sensitivity is promoted in a model-based reinforcement learning algorithm.
We propose uncertainty guided cross-entropy method planning, which penalises action sequences that result in high variance state predictions.
Experiments display the ability for the agent to identify uncertain regions of the state space during planning and to take actions that maintain the agent within high confidence areas.
arXiv Detail & Related papers (2021-11-09T07:28:00Z) - Learning to Be Cautious [71.9871661858886]
A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations.
We present a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to emphlearn to be cautious.
arXiv Detail & Related papers (2021-10-29T16:52:45Z) - Learning Uncertainty For Safety-Oriented Semantic Segmentation In
Autonomous Driving [77.39239190539871]
We show how uncertainty estimation can be leveraged to enable safety critical image segmentation in autonomous driving.
We introduce a new uncertainty measure based on disagreeing predictions as measured by a dissimilarity function.
We show experimentally that our proposed approach is much less computationally intensive at inference time than competing methods.
arXiv Detail & Related papers (2021-05-28T09:23:05Z) - Safe Learning of Uncertain Environments for Nonlinear Control-Affine
Systems [10.918870296899245]
We consider the problem of safe learning in nonlinear control-affine systems subject to unknown additive uncertainty.
We model uncertainty as a Gaussian signal and use state measurements to learn its mean and covariance bounds.
We show that with an arbitrarily large probability we can guarantee that the state will remain in the safe set, while learning and control are carried out simultaneously.
arXiv Detail & Related papers (2021-03-02T01:58:02Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy.
In this paper, we highlight that value estimates are easily biased and temporally inconsistent.
We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.