Augmenting Reinforcement Learning with Transformer-based Scene
Representation Learning for Decision-making of Autonomous Driving
- URL: http://arxiv.org/abs/2208.12263v3
- Date: Fri, 25 Aug 2023 05:41:23 GMT
- Title: Augmenting Reinforcement Learning with Transformer-based Scene
Representation Learning for Decision-making of Autonomous Driving
- Authors: Haochen Liu, Zhiyu Huang, Xiaoyu Mo, and Chen Lv
- Abstract summary: We propose Scene-Rep Transformer to improve the reinforcement learning decision-making capabilities.
A multi-stage Transformer (MST) encoder is constructed to model the interaction awareness between the ego vehicle and its neighbors.
A sequential latent Transformer (SLT) with self-supervised learning objectives is employed to distill the future predictive information into the latent scene representation.
- Score: 27.84595432822612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decision-making for urban autonomous driving is challenging due to the
stochastic nature of interactive traffic participants and the complexity of
road structures. Although reinforcement learning (RL)-based decision-making
scheme is promising to handle urban driving scenarios, it suffers from low
sample efficiency and poor adaptability. In this paper, we propose Scene-Rep
Transformer to improve the RL decision-making capabilities with better scene
representation encoding and sequential predictive latent distillation.
Specifically, a multi-stage Transformer (MST) encoder is constructed to model
not only the interaction awareness between the ego vehicle and its neighbors
but also intention awareness between the agents and their candidate routes. A
sequential latent Transformer (SLT) with self-supervised learning objectives is
employed to distill the future predictive information into the latent scene
representation, in order to reduce the exploration space and speed up training.
The final decision-making module based on soft actor-critic (SAC) takes as
input the refined latent scene representation from the Scene-Rep Transformer
and outputs driving actions. The framework is validated in five challenging
simulated urban scenarios with dense traffic, and its performance is manifested
quantitatively by the substantial improvements in data efficiency and
performance in terms of success rate, safety, and efficiency. The qualitative
results reveal that our framework is able to extract the intentions of neighbor
agents to help make decisions and deliver more diversified driving behaviors.
Related papers
- GITSR: Graph Interaction Transformer-based Scene Representation for Multi Vehicle Collaborative Decision-making [9.910230703889956]
This study focuses on efficient scene representation and the modeling of spatial interaction behaviors of traffic states.
In this study, we propose GITSR, an effective framework for Graph Interaction Transformer-based Scene Representation.
arXiv Detail & Related papers (2024-11-03T15:27:26Z) - End-to-end Driving in High-Interaction Traffic Scenarios with Reinforcement Learning [24.578178308010912]
We propose an end-to-end model-based RL algorithm named Ramble to address these issues.
By learning a dynamics model of the environment, Ramble can foresee upcoming traffic events and make more informed, strategic decisions.
Ramble achieves state-of-the-art performance regarding route completion rate and driving score on the CARLA Leaderboard 2.0, showcasing its effectiveness in managing complex and dynamic traffic situations.
arXiv Detail & Related papers (2024-10-03T06:45:59Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Autonomous Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.
Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.
Experiments conducted on nuScenes dataset demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - Parameterized Decision-making with Multi-modal Perception for Autonomous
Driving [12.21578713219778]
We propose a parameterized decision-making framework with multi-modal perception based on deep reinforcement learning, called AUTO.
A hybrid reward function takes into account aspects of safety, traffic efficiency, passenger comfort, and impact to guide the framework to generate optimal actions.
arXiv Detail & Related papers (2023-12-19T08:27:02Z) - Decision Making for Autonomous Driving in Interactive Merge Scenarios
via Learning-based Prediction [39.48631437946568]
This paper focuses on the complex task of merging into moving traffic where uncertainty emanates from the behavior of other drivers.
We frame the problem as a partially observable Markov decision process (POMDP) and solve it online with Monte Carlo tree search.
The solution to the POMDP is a policy that performs high-level driving maneuvers, such as giving way to an approaching car, keeping a safe distance from the vehicle in front or merging into traffic.
arXiv Detail & Related papers (2023-03-29T16:12:45Z) - Traj-MAE: Masked Autoencoders for Trajectory Prediction [69.7885837428344]
Trajectory prediction has been a crucial task in building a reliable autonomous driving system by anticipating possible dangers.
We propose an efficient masked autoencoder for trajectory prediction (Traj-MAE) that better represents the complicated behaviors of agents in the driving environment.
Our experimental results in both multi-agent and single-agent settings demonstrate that Traj-MAE achieves competitive results with state-of-the-art methods.
arXiv Detail & Related papers (2023-03-12T16:23:27Z) - Exploring Contextual Representation and Multi-Modality for End-to-End
Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context.
We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation.
Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z) - Transferable and Adaptable Driving Behavior Prediction [34.606012573285554]
We propose HATN, a hierarchical framework to generate high-quality, transferable, and adaptable predictions for driving behaviors.
We demonstrate our algorithms in the task of trajectory prediction for real traffic data at intersections and roundabouts from the INTERACTION dataset.
arXiv Detail & Related papers (2022-02-10T16:46:24Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - Deep Structured Reactive Planning [94.92994828905984]
We propose a novel data-driven, reactive planning objective for self-driving vehicles.
We show that our model outperforms a non-reactive variant in successfully completing highly complex maneuvers.
arXiv Detail & Related papers (2021-01-18T01:43:36Z) - Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [78.74510891099395]
In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene.
arXiv Detail & Related papers (2020-07-23T14:31:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.