Uncertainty-Aware Decision Transformer for Stochastic Driving
Environments
- URL: http://arxiv.org/abs/2309.16397v2
- Date: Fri, 17 Nov 2023 05:41:45 GMT
- Title: Uncertainty-Aware Decision Transformer for Stochastic Driving
Environments
- Authors: Zenan Li, Fan Nie, Qiao Sun, Fang Da, Hang Zhao
- Abstract summary: We introduce an UNcertainty-awaRESion Transformer (UNREST) for planning in driving environments.
UNREST estimates uncertainties by the conditional mutual information between transitions and returns.
We show UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.
- Score: 37.31853034449015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline Reinforcement Learning (RL) has emerged as a promising framework for
learning policies without active interactions, making it especially appealing
for autonomous driving tasks. Recent successes of Transformers inspire casting
offline RL as sequence modeling, which performs well in long-horizon tasks.
However, they are overly optimistic in stochastic environments with incorrect
assumptions that the same goal can be consistently achieved by identical
actions. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer
(UNREST) for planning in stochastic driving environments without introducing
additional transition or complex generative models. Specifically, UNREST
estimates state uncertainties by the conditional mutual information between
transitions and returns, and segments sequences accordingly. Discovering the
`uncertainty accumulation' and `temporal locality' properties of driving
environments, UNREST replaces the global returns in decision transformers with
less uncertain truncated returns, to learn from true outcomes of agent actions
rather than environment transitions. We also dynamically evaluate environmental
uncertainty during inference for cautious planning. Extensive experimental
results demonstrate UNREST's superior performance in various driving scenarios
and the power of our uncertainty estimation strategy.
Related papers
- Sample-efficient Imitative Multi-token Decision Transformer for Generalizable Real World Driving [18.34685506480288]
We propose Sample-efficient Imitative Multi-token Decision Transformer (SimDT)
SimDT introduces multi-token prediction, imitative online learning and prioritized experience replay to Decision Transformer.
Results exceed popular imitation and reinforcement learning algorithms on Waymax benchmark.
arXiv Detail & Related papers (2024-06-18T14:27:14Z) - Latent Plan Transformer: Planning as Latent Variable Inference [53.419249906014194]
We study generative modeling for planning with datasets repurposed from offline reinforcement learning.
We introduce the Latent Plan Transformer (), a novel model that leverages a latent space to connect a Transformer-based trajectory generator and the final return.
At test time, the latent variable is inferred from an expected return before policy execution, realizing the idea of planning as inference.
arXiv Detail & Related papers (2024-02-07T08:18:09Z) - Controllable Diverse Sampling for Diffusion Based Motion Behavior
Forecasting [11.106812447960186]
We introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT)
CDT integrates information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories.
To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left.
arXiv Detail & Related papers (2024-02-06T13:16:54Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a novel diffusion-based controllable closed-loop safety-critical simulation framework.
We develop a novel approach to simulate safety-critical scenarios through an adversarial term in the denoising process.
We validate our framework empirically using the NuScenes dataset, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Dealing with uncertainty: balancing exploration and exploitation in deep
recurrent reinforcement learning [0.0]
Incomplete knowledge of the environment leads an agent to make decisions under uncertainty.
One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions.
We show that adaptive methods better approximate the trade-off between exploration and exploitation.
arXiv Detail & Related papers (2023-10-12T13:45:33Z) - Environment Transformer and Policy Optimization for Model-Based Offline
Reinforcement Learning [25.684201757101267]
We propose an uncertainty-aware sequence modeling architecture called Environment Transformer.
Benefiting from the accurate modeling of the transition dynamics and reward function, Environment Transformer can be combined with arbitrary planning, dynamics programming, or policy optimization algorithms for offline RL.
arXiv Detail & Related papers (2023-03-07T11:26:09Z) - Augmenting Reinforcement Learning with Transformer-based Scene
Representation Learning for Decision-making of Autonomous Driving [27.84595432822612]
We propose Scene-Rep Transformer to improve the reinforcement learning decision-making capabilities.
A multi-stage Transformer (MST) encoder is constructed to model the interaction awareness between the ego vehicle and its neighbors.
A sequential latent Transformer (SLT) with self-supervised learning objectives is employed to distill the future predictive information into the latent scene representation.
arXiv Detail & Related papers (2022-08-24T08:05:18Z) - Can Autonomous Vehicles Identify, Recover From, and Adapt to
Distribution Shifts? [104.04999499189402]
Out-of-training-distribution (OOD) scenarios are a common challenge of learning agents at deployment.
We propose an uncertainty-aware planning method, called emphrobust imitative planning (RIP)
Our method can detect and recover from some distribution shifts, reducing the overconfident and catastrophic extrapolations in OOD scenes.
We introduce an autonomous car novel-scene benchmark, textttCARNOVEL, to evaluate the robustness of driving agents to a suite of tasks with distribution shifts.
arXiv Detail & Related papers (2020-06-26T11:07:32Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.