Identify, Estimate and Bound the Uncertainty of Reinforcement Learning
for Autonomous Driving
- URL: http://arxiv.org/abs/2305.07487v1
- Date: Fri, 12 May 2023 13:58:31 GMT
- Title: Identify, Estimate and Bound the Uncertainty of Reinforcement Learning
for Autonomous Driving
- Authors: Weitao Zhou, Zhong Cao, Nanshan Deng, Kun Jiang, Diange Yang
- Abstract summary: Deep reinforcement learning (DRL) has emerged as a promising approach for developing more intelligent autonomous vehicles (AVs)
This work proposes a method to identify and protect unreliable decisions of a DRL driving policy.
- Score: 4.932817328815897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL) has emerged as a promising approach for
developing more intelligent autonomous vehicles (AVs). A typical DRL
application on AVs is to train a neural network-based driving policy. However,
the black-box nature of neural networks can result in unpredictable decision
failures, making such AVs unreliable. To this end, this work proposes a method
to identify and protect unreliable decisions of a DRL driving policy. The basic
idea is to estimate and constrain the policy's performance uncertainty, which
quantifies potential performance drop due to insufficient training data or
network fitting errors. By constraining the uncertainty, the DRL model's
performance is always greater than that of a baseline policy. The uncertainty
caused by insufficient data is estimated by the bootstrapped method. Then, the
uncertainty caused by the network fitting error is estimated using an ensemble
network. Finally, a baseline policy is added as the performance lower bound to
avoid potential decision failures. The overall framework is called
uncertainty-bound reinforcement learning (UBRL). The proposed UBRL is evaluated
on DRL policies with different amounts of training data, taking an unprotected
left-turn driving case as an example. The result shows that the UBRL method can
identify potentially unreliable decisions of DRL policy. The UBRL guarantees to
outperform baseline policy even when the DRL policy is not well-trained and has
high uncertainty. Meanwhile, the performance of UBRL improves with more
training data. Such a method is valuable for the DRL application on real-road
driving and provides a metric to evaluate a DRL policy.
Related papers
- Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows [30.926243761581624]
Causal Normalizing Flow (CNF) is developed to learn the transition and reward functions for data generation and augmentation in offline policy evaluation and training.
CNF gains predictive and counterfactual reasoning capabilities for sequential decision-making tasks, revealing a high potential for OOD adaptation.
Our CNF-based offline RL approach is validated through empirical evaluations, outperforming model-free and model-based methods by a significant margin.
arXiv Detail & Related papers (2024-05-06T22:44:32Z) - Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement
Learning [125.8224674893018]
Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment.
Applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions.
We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
arXiv Detail & Related papers (2022-02-23T15:27:16Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - The Least Restriction for Offline Reinforcement Learning [0.0]
We propose a creative offline reinforcement learning framework, the Least Restriction (LR)
The LR regards selecting an action as taking a sample from the probability distribution.
It is able to learn robustly from different offline datasets, including random and suboptimal demonstrations.
arXiv Detail & Related papers (2021-07-05T01:50:40Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z) - Near Real-World Benchmarks for Offline Reinforcement Learning [26.642722521820467]
We present a suite of near real-world benchmarks, NewRL.
NewRL contains datasets from various domains with controlled sizes and extra test datasets for the purpose of policy validation.
We argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward.
arXiv Detail & Related papers (2021-02-01T09:19:10Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - Tactical Decision-Making in Autonomous Driving by Reinforcement Learning
with Uncertainty Estimation [0.9883261192383611]
Reinforcement learning can be used to create a tactical decision-making agent for autonomous driving.
This paper investigates how a Bayesian RL technique can be used to estimate the uncertainty of decisions in autonomous driving.
arXiv Detail & Related papers (2020-04-22T08:22:28Z) - Robust Deep Reinforcement Learning against Adversarial Perturbations on
State Observations [88.94162416324505]
A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises.
Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions.
We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks.
arXiv Detail & Related papers (2020-03-19T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.