Related papers: Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving

Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving

URL: http://arxiv.org/abs/2411.03702v1
Date: Wed, 06 Nov 2024 06:58:17 GMT
Title: Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving
Authors: Depanshu Sani, Saket Anand,
Abstract summary: We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation. We present a Sensor-Agnostic Graph-Aware Kalman Filter, the first online state estimation technique designed to fuse multi-modal graphs. We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets.
Score: 3.770103075126785
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing demand for robust scene understanding in mobile robotics and autonomous driving has highlighted the importance of integrating multiple sensing modalities. By combining data from diverse sensors like cameras and LIDARs, fusion techniques can overcome the limitations of individual sensors, enabling a more complete and accurate perception of the environment. We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation that supports critical decision-making processes in autonomous driving. We present a Sensor-Agnostic Graph-Aware Kalman Filter [3], the first online state estimation technique designed to fuse multi-modal graphs derived from noisy multi-sensor data. The estimated graph-based state representations serve as a foundation for advanced applications like Multi-Object Tracking (MOT), offering a comprehensive framework for enhancing the situational awareness and safety of autonomous systems. We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets (nuScenes). Our results showcase an improvement in MOTA and a reduction in estimated position errors (MOTP) and identity switches (IDS) for tracked objects using the SAGA-KF. Furthermore, we highlight the capability of such a framework to develop methods that can leverage heterogeneous information (like semantic objects and geometric structures) from various sensing modalities, enabling a more holistic approach to scene understanding and enhancing the safety and effectiveness of autonomous systems.

Related papers

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems [75.78934957242403]
Self-driving vehicles and drones require true Spatial Intelligence from multi-modal onboard sensor data.<n>This paper presents a framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal.
arXiv Detail & Related papers (2025-12-30T17:58:01Z)
OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving [58.693329943871355]
We propose OminiGen, which generates aligned multimodal sensor data in a unified framework.<n>Our approach leverages a shared Birdu 2019s Eye View (BEV) space to unify multimodal features.<n>UAE achieves multimodal sensor decoding through volume rendering, enabling accurate and flexible reconstruction.
arXiv Detail & Related papers (2025-12-16T09:18:15Z)
Layer-Wise Modality Decomposition for Interpretable Multimodal Sensor Fusion [20.84456781070161]
We introduce Layer-Wise Modality Decomposition (LMD) to disentangle modality-specific information across layers of a pretrained fusion model.<n>We evaluate LMD on pretrained fusion models under camera-radar, camera-LiDAR, and camera-radar-LiDAR settings for autonomous driving.
arXiv Detail & Related papers (2025-11-02T08:52:24Z)
Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles [11.412978676426205]
Multi-sensor fusion plays a critical role in enhancing perception for autonomous driving.<n>This paper formalizes multi-sensor fusion strategies into data-level, feature-level, and decision-level categories.<n>We present key multi-modal datasets and discuss their applicability in addressing real-world challenges.
arXiv Detail & Related papers (2025-06-27T03:43:48Z)
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision [49.073964142139495]
We systematically review the applications and advancements of multimodal fusion methods and vision-language models.<n>For semantic scene understanding tasks, we categorize fusion approaches into encoder-decoder frameworks, attention-based architectures, and graph neural networks.<n>We identify key challenges in current research, including cross-modal alignment, efficient fusion, real-time deployment, and domain adaptation.
arXiv Detail & Related papers (2025-04-03T10:53:07Z)
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
Multimodal Large Language Models (MLLMs) can process both visual and textual data. We propose SafeAuto, a novel framework that enhances MLLM-based autonomous driving systems by incorporating both unstructured and structured knowledge.
arXiv Detail & Related papers (2025-02-28T21:53:47Z)
XAI-based Feature Ensemble for Enhanced Anomaly Detection in Autonomous Driving Systems [1.3022753212679383]
This paper proposes a novel feature ensemble framework that integrates multiple Explainable AI (XAI) methods. By fusing top features identified by these XAI methods across six diverse AI models, the framework creates a robust and comprehensive set of features critical for detecting anomalies. Our technique demonstrates improved accuracy, robustness, and transparency of AI models, contributing to safer and more trustworthy autonomous driving systems.
arXiv Detail & Related papers (2024-10-20T14:34:48Z)
Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes [56.52618054240197]
We propose a novel, condition-aware multimodal fusion approach for robust semantic perception of driving scenes. Our method, CAFuser, uses an RGB camera input to classify environmental conditions and generate a Condition Token that guides the fusion of multiple sensor modalities. We set the new state of the art with CAFuser on the MUSES dataset with 59.7 PQ for multimodal panoptic segmentation and 78.2 mIoU for semantic segmentation, ranking first on the public benchmarks.
arXiv Detail & Related papers (2024-10-14T17:56:20Z)
Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors [6.166992288822812]
Multi-Object Tracking plays a critical role in ensuring safer and more efficient navigation through complex traffic scenarios. This paper presents a novel deep learning-based method that integrates radar and camera data to enhance the accuracy and robustness of Multi-Object Tracking in autonomous driving systems.
arXiv Detail & Related papers (2024-07-10T21:09:09Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for Place Recognition [11.206532393178385]
We present a novel neural network named LCPR for robust multimodal place recognition. Our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance.
arXiv Detail & Related papers (2023-11-06T15:39:48Z)
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z)
AutoFed: Heterogeneity-Aware Federated Multimodal Learning for Robust Autonomous Driving [15.486799633600423]
AutoFed is a framework to fully exploit multimodal sensory data on autonomous vehicles. We propose a novel model leveraging pseudo-labeling to avoid mistakenly treating unlabeled objects as the background. We also propose an autoencoder-based data imputation method to fill missing data modality.
arXiv Detail & Related papers (2023-02-17T01:31:53Z)
Exploring Contextual Representation and Multi-Modality for End-to-End Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context. We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation. Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z)
Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
Learning Selective Sensor Fusion for States Estimation [47.76590539558037]
We propose SelectFusion, an end-to-end selective sensor fusion module. During prediction, the network is able to assess the reliability of the latent features from different sensor modalities. We extensively evaluate all fusion strategies in both public datasets and on progressively degraded datasets.
arXiv Detail & Related papers (2019-12-30T20:25:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.