FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of
Autonomous Driving
- URL: http://arxiv.org/abs/2308.01006v4
- Date: Mon, 14 Aug 2023 08:28:32 GMT
- Title: FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of
Autonomous Driving
- Authors: Tengju Ye, Wei Jing, Chunyong Hu, Shikun Huang, Lingping Gao, Fangzhen
Li, Jingke Wang, Ke Guo, Wencong Xiao, Weibo Mao, Hang Zheng, Kun Li, Junbo
Chen, Kaicheng Yu
- Abstract summary: We present FusionAD, the first unified framework that fuse the information from most critical sensors, camera and LiDAR, goes beyond perception task.
In constrast to camera-based end-to-end UniAD, we establish a method fusion aided modality-aware prediction status planning modules, dubbed FMS.
We conduct extensive experiments on commonly used benchmark nu's dataset, our advantages state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389, and reducing collision rate from 0.31%
- Score: 20.037562671813
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Building a multi-modality multi-task neural network toward accurate and
robust performance is a de-facto standard in perception task of autonomous
driving. However, leveraging such data from multiple sensors to jointly
optimize the prediction and planning tasks remains largely unexplored. In this
paper, we present FusionAD, to the best of our knowledge, the first unified
framework that fuse the information from two most critical sensors, camera and
LiDAR, goes beyond perception task. Concretely, we first build a transformer
based multi-modality fusion network to effectively produce fusion based
features. In constrast to camera-based end-to-end method UniAD, we then
establish a fusion aided modality-aware prediction and status-aware planning
modules, dubbed FMSPnP that take advantages of multi-modality features. We
conduct extensive experiments on commonly used benchmark nuScenes dataset, our
FusionAD achieves state-of-the-art performance and surpassing baselines on
average 15% on perception tasks like detection and tracking, 10% on occupancy
prediction accuracy, reducing prediction error from 0.708 to 0.389 in ADE score
and reduces the collision rate from 0.31% to only 0.12%.
Related papers
- Transforming In-Vehicle Network Intrusion Detection: VAE-based Knowledge Distillation Meets Explainable AI [0.0]
This paper introduces an advanced intrusion detection system (IDS) called KD-XVAE that uses a Variational Autoencoder (VAE)-based knowledge distillation approach.
Our model significantly reduces complexity, operating with just 1669 parameters and achieving an inference time of 0.3 ms per batch.
arXiv Detail & Related papers (2024-10-11T17:57:16Z) - Steering Prediction via a Multi-Sensor System for Autonomous Racing [45.70482345703285]
Traditionally, racing cars rely on 2D LiDAR as their primary visual system.
In this work, we explore the integration of an event camera with the existing system to provide enhanced temporal information.
Our goal is to fuse the 2D LiDAR data with event data in an end-to-end learning framework for steering prediction, which is crucial for autonomous racing.
arXiv Detail & Related papers (2024-09-28T13:58:24Z) - Foundation Models for Structural Health Monitoring [17.37816294594306]
We propose for the first time the use of Transformer neural networks, with a Masked Auto-Encoder architecture, as Foundation Models for Structural Health Monitoring.
We demonstrate the ability of these models to learn generalizable representations from multiple large datasets through self-supervised pre-training.
We showcase the effectiveness of our foundation models using data from three operational viaducts.
arXiv Detail & Related papers (2024-04-03T13:32:44Z) - An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution.
We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture.
We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Transforming Model Prediction for Tracking [109.08417327309937]
Transformers capture global relations with little inductive bias, allowing it to learn the prediction of more powerful target models.
We train the proposed tracker end-to-end and validate its performance by conducting comprehensive experiments on multiple tracking datasets.
Our tracker sets a new state of the art on three benchmarks, achieving an AUC of 68.5% on the challenging LaSOT dataset.
arXiv Detail & Related papers (2022-03-21T17:59:40Z) - On Efficient Uncertainty Estimation for Resource-Constrained Mobile
Applications [0.0]
Predictive uncertainty supplements model predictions and enables improved functionality of downstream tasks.
We tackle this problem by building upon Monte Carlo Dropout (MCDO) models using the Axolotl framework.
We conduct experiments on (1) a multi-class classification task using the CIFAR10 dataset, and (2) a more complex human body segmentation task.
arXiv Detail & Related papers (2021-11-11T22:24:15Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Efficient and Robust LiDAR-Based End-to-End Navigation [132.52661670308606]
We present an efficient and robust LiDAR-based end-to-end navigation framework.
We propose Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design.
We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass.
arXiv Detail & Related papers (2021-05-20T17:52:37Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.