Related papers: FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving

FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving

URL: http://arxiv.org/abs/2308.01006v4
Date: Mon, 14 Aug 2023 08:28:32 GMT
Title: FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving
Authors: Tengju Ye, Wei Jing, Chunyong Hu, Shikun Huang, Lingping Gao, Fangzhen Li, Jingke Wang, Ke Guo, Wencong Xiao, Weibo Mao, Hang Zheng, Kun Li, Junbo Chen, Kaicheng Yu
Abstract summary: We present FusionAD, the first unified framework that fuse the information from most critical sensors, camera and LiDAR, goes beyond perception task. In constrast to camera-based end-to-end UniAD, we establish a method fusion aided modality-aware prediction status planning modules, dubbed FMS. We conduct extensive experiments on commonly used benchmark nu's dataset, our advantages state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389, and reducing collision rate from 0.31%
Score: 20.037562671813
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Building a multi-modality multi-task neural network toward accurate and robust performance is a de-facto standard in perception task of autonomous driving. However, leveraging such data from multiple sensors to jointly optimize the prediction and planning tasks remains largely unexplored. In this paper, we present FusionAD, to the best of our knowledge, the first unified framework that fuse the information from two most critical sensors, camera and LiDAR, goes beyond perception task. Concretely, we first build a transformer based multi-modality fusion network to effectively produce fusion based features. In constrast to camera-based end-to-end method UniAD, we then establish a fusion aided modality-aware prediction and status-aware planning modules, dubbed FMSPnP that take advantages of multi-modality features. We conduct extensive experiments on commonly used benchmark nuScenes dataset, our FusionAD achieves state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389 in ADE score and reduces the collision rate from 0.31% to only 0.12%.

Related papers

Data-Driven Approximation of Binary-State Network Reliability Function: Algorithm Selection and Reliability Thresholds for Large-Scale Systems [0.08158530638728499]
This study evaluates 20 machine learning methods across three reliability regimes full range (0.0-1.0), high reliability (0.9-1.0), and ultra high reliability (0.99-1.0) We demonstrate that large-scale networks with arc reliability larger than or equal to 0.9 exhibit near-unity system reliability, enabling computational simplifications.
arXiv Detail & Related papers (2025-03-16T13:51:59Z)
ZIA: A Theoretical Framework for Zero-Input AI [0.0]
Zero-Input AI (ZIA) introduces a novel framework for human-computer interaction by enabling proactive intent prediction without explicit user commands. It integrates gaze tracking, bio-signals (EEG, heart rate), and contextual data (time, location, usage history) into a multi-modal model for real-time inference. ZIA provides a scalable, privacy-preserving framework for accessibility, healthcare, and consumer applications, advancing AI toward anticipatory intelligence.
arXiv Detail & Related papers (2025-02-22T07:42:05Z)
End-to-End Predictive Planner for Autonomous Driving with Consistency Models [5.966385886363771]
Trajectory prediction and planning are fundamental components for autonomous vehicles to navigate safely and efficiently in dynamic environments. Traditionally, these components have often been treated as separate modules, limiting the ability to perform interactive planning. We present a novel unified and data-driven framework that integrates prediction and planning with a single consistency model.
arXiv Detail & Related papers (2025-02-12T00:26:01Z)
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness [65.01625761120924]
We argue that a valuable sample should be informative of the task, non-redundant, and represent the sample distribution (i.e., not an outlier) We propose a collaborative framework, DataTailor, which leverages three key principles--informativeness, uniqueness, and representativeness--for effective data selection. Experiments on various benchmarks demonstrate that DataTailor achieves 100.8% of the performance of full-data fine-tuning with only 15% of the data.
arXiv Detail & Related papers (2024-12-09T08:36:10Z)
Transforming In-Vehicle Network Intrusion Detection: VAE-based Knowledge Distillation Meets Explainable AI [0.0]
This paper introduces an advanced intrusion detection system (IDS) called KD-XVAE that uses a Variational Autoencoder (VAE)-based knowledge distillation approach. Our model significantly reduces complexity, operating with just 1669 parameters and achieving an inference time of 0.3 ms per batch.
arXiv Detail & Related papers (2024-10-11T17:57:16Z)
Steering Prediction via a Multi-Sensor System for Autonomous Racing [45.70482345703285]
Traditionally, racing cars rely on 2D LiDAR as their primary visual system. In this work, we explore the integration of an event camera with the existing system to provide enhanced temporal information. Our goal is to fuse the 2D LiDAR data with event data in an end-to-end learning framework for steering prediction, which is crucial for autonomous racing.
arXiv Detail & Related papers (2024-09-28T13:58:24Z)
Foundation Models for Structural Health Monitoring [17.37816294594306]
We propose for the first time the use of Transformer neural networks, with a Masked Auto-Encoder architecture, as Foundation Models for Structural Health Monitoring. We demonstrate the ability of these models to learn generalizable representations from multiple large datasets through self-supervised pre-training. We showcase the effectiveness of our foundation models using data from three operational viaducts.
arXiv Detail & Related papers (2024-04-03T13:32:44Z)
An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution. We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture. We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z)
Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting [61.02295959343446]
This work first proposes a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from interaction modules. We build a general CU-aware regression framework with an original permutation-equivariant uncertainty estimator to do both tasks of regression and uncertainty estimation. We apply the proposed framework to current SOTA multi-agent trajectory forecasting systems as a plugin module.
arXiv Detail & Related papers (2022-07-11T21:17:41Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
Transforming Model Prediction for Tracking [109.08417327309937]
Transformers capture global relations with little inductive bias, allowing it to learn the prediction of more powerful target models. We train the proposed tracker end-to-end and validate its performance by conducting comprehensive experiments on multiple tracking datasets. Our tracker sets a new state of the art on three benchmarks, achieving an AUC of 68.5% on the challenging LaSOT dataset.
arXiv Detail & Related papers (2022-03-21T17:59:40Z)
On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications [0.0]
Predictive uncertainty supplements model predictions and enables improved functionality of downstream tasks. We tackle this problem by building upon Monte Carlo Dropout (MCDO) models using the Axolotl framework. We conduct experiments on (1) a multi-class classification task using the CIFAR10 dataset, and (2) a more complex human body segmentation task.
arXiv Detail & Related papers (2021-11-11T22:24:15Z)
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
Efficient and Robust LiDAR-Based End-to-End Navigation [132.52661670308606]
We present an efficient and robust LiDAR-based end-to-end navigation framework. We propose Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design. We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass.
arXiv Detail & Related papers (2021-05-20T17:52:37Z)
Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination. Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities. We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.