FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of
Autonomous Driving
- URL: http://arxiv.org/abs/2308.01006v4
- Date: Mon, 14 Aug 2023 08:28:32 GMT
- Title: FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of
Autonomous Driving
- Authors: Tengju Ye, Wei Jing, Chunyong Hu, Shikun Huang, Lingping Gao, Fangzhen
Li, Jingke Wang, Ke Guo, Wencong Xiao, Weibo Mao, Hang Zheng, Kun Li, Junbo
Chen, Kaicheng Yu
- Abstract summary: We present FusionAD, the first unified framework that fuse the information from most critical sensors, camera and LiDAR, goes beyond perception task.
In constrast to camera-based end-to-end UniAD, we establish a method fusion aided modality-aware prediction status planning modules, dubbed FMS.
We conduct extensive experiments on commonly used benchmark nu's dataset, our advantages state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389, and reducing collision rate from 0.31%
- Score: 20.037562671813
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Building a multi-modality multi-task neural network toward accurate and
robust performance is a de-facto standard in perception task of autonomous
driving. However, leveraging such data from multiple sensors to jointly
optimize the prediction and planning tasks remains largely unexplored. In this
paper, we present FusionAD, to the best of our knowledge, the first unified
framework that fuse the information from two most critical sensors, camera and
LiDAR, goes beyond perception task. Concretely, we first build a transformer
based multi-modality fusion network to effectively produce fusion based
features. In constrast to camera-based end-to-end method UniAD, we then
establish a fusion aided modality-aware prediction and status-aware planning
modules, dubbed FMSPnP that take advantages of multi-modality features. We
conduct extensive experiments on commonly used benchmark nuScenes dataset, our
FusionAD achieves state-of-the-art performance and surpassing baselines on
average 15% on perception tasks like detection and tracking, 10% on occupancy
prediction accuracy, reducing prediction error from 0.708 to 0.389 in ADE score
and reduces the collision rate from 0.31% to only 0.12%.
Related papers
- Efficient Fusion and Task Guided Embedding for End-to-end Autonomous Driving [1.3149617027696827]
We introduce a compact yet potent solution named EfficientFuser to address the challenges of sensor fusion and safety risk prediction.
Evaluated on the CARLA simulation platform, EfficientFuser demonstrates remarkable efficiency, utilizing merely 37.6% of the parameters.
The safety score neared that of the leading safety-enhanced method, showcasing its efficacy and potential for practical deployment in autonomous driving systems.
arXiv Detail & Related papers (2024-07-03T07:45:58Z) - Foundation Models for Structural Health Monitoring [17.37816294594306]
We propose for the first time the use of Transformer neural networks, with a Masked Auto-Encoder architecture, as Foundation Models for Structural Health Monitoring.
We demonstrate the ability of these models to learn generalizable representations from multiple large datasets through self-supervised pre-training.
We showcase the effectiveness of our foundation models using data from three operational viaducts.
arXiv Detail & Related papers (2024-04-03T13:32:44Z) - An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution.
We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture.
We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Transforming Model Prediction for Tracking [109.08417327309937]
Transformers capture global relations with little inductive bias, allowing it to learn the prediction of more powerful target models.
We train the proposed tracker end-to-end and validate its performance by conducting comprehensive experiments on multiple tracking datasets.
Our tracker sets a new state of the art on three benchmarks, achieving an AUC of 68.5% on the challenging LaSOT dataset.
arXiv Detail & Related papers (2022-03-21T17:59:40Z) - On Efficient Uncertainty Estimation for Resource-Constrained Mobile
Applications [0.0]
Predictive uncertainty supplements model predictions and enables improved functionality of downstream tasks.
We tackle this problem by building upon Monte Carlo Dropout (MCDO) models using the Axolotl framework.
We conduct experiments on (1) a multi-class classification task using the CIFAR10 dataset, and (2) a more complex human body segmentation task.
arXiv Detail & Related papers (2021-11-11T22:24:15Z) - Perception-aware Multi-sensor Fusion for 3D LiDAR Semantic Segmentation [59.42262859654698]
3D semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics.
Existing fusion-based methods may not achieve promising performance due to vast difference between two modalities.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Efficient and Robust LiDAR-Based End-to-End Navigation [132.52661670308606]
We present an efficient and robust LiDAR-based end-to-end navigation framework.
We propose Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design.
We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass.
arXiv Detail & Related papers (2021-05-20T17:52:37Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.