Layer-Wise Modality Decomposition for Interpretable Multimodal Sensor Fusion
- URL: http://arxiv.org/abs/2511.00859v1
- Date: Sun, 02 Nov 2025 08:52:24 GMT
- Title: Layer-Wise Modality Decomposition for Interpretable Multimodal Sensor Fusion
- Authors: Jaehyun Park, Konyul Park, Daehun Kim, Junseo Park, Jun Won Choi,
- Abstract summary: We introduce Layer-Wise Modality Decomposition (LMD) to disentangle modality-specific information across layers of a pretrained fusion model.<n>We evaluate LMD on pretrained fusion models under camera-radar, camera-LiDAR, and camera-radar-LiDAR settings for autonomous driving.
- Score: 20.84456781070161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In autonomous driving, transparency in the decision-making of perception models is critical, as even a single misperception can be catastrophic. Yet with multi-sensor inputs, it is difficult to determine how each modality contributes to a prediction because sensor information becomes entangled within the fusion network. We introduce Layer-Wise Modality Decomposition (LMD), a post-hoc, model-agnostic interpretability method that disentangles modality-specific information across all layers of a pretrained fusion model. To our knowledge, LMD is the first approach to attribute the predictions of a perception model to individual input modalities in a sensor-fusion system for autonomous driving. We evaluate LMD on pretrained fusion models under camera-radar, camera-LiDAR, and camera-radar-LiDAR settings for autonomous driving. Its effectiveness is validated using structured perturbation-based metrics and modality-wise visual decompositions, demonstrating practical applicability to interpreting high-capacity multimodal architectures. Code is available at https://github.com/detxter-jvb/Layer-Wise-Modality-Decomposition.
Related papers
- A Comprehensive Survey on Deep Learning-Based LiDAR Super-Resolution for Autonomous Driving [0.4078247440919472]
This paper presents the first comprehensive survey of LiDAR super-resolution methods for autonomous driving.<n>We organize existing approaches into four categories: CNN-based architectures, model-based deep unrolling, implicit representation methods, and Transformer and Mamba-based approaches.<n>Current trends include the adoption of range image representation for efficient processing, extreme model compression and the development of resolution-flexible architectures.
arXiv Detail & Related papers (2026-02-15T22:34:28Z) - Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems [75.78934957242403]
Self-driving vehicles and drones require true Spatial Intelligence from multi-modal onboard sensor data.<n>This paper presents a framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal.
arXiv Detail & Related papers (2025-12-30T17:58:01Z) - Towards Safer and Understandable Driver Intention Prediction [30.136400523083907]
We introduce the task of interpretability in maneuver prediction before they occur for driver safety.<n>To foster research in interpretable DIP, we curate the DAAD-X, a new multimodal, ego-centric video dataset.<n>Next, we propose Video Concept Bottleneck Model (VCBM), a framework that generates coherent explanations inherently.
arXiv Detail & Related papers (2025-10-10T09:41:25Z) - Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z) - A CLIP-based Uncertainty Modal Modeling (UMM) Framework for Pedestrian Re-Identification in Autonomous Driving [6.223368492604449]
Uncertainty Modal Modeling (UMM) framework integrates a multimodal token mapper, synthetic modality augmentation strategy, and cross-modal cue interactive learner.<n>UMM achieves strong robustness, generalization, and computational efficiency under uncertain modality conditions.
arXiv Detail & Related papers (2025-08-15T04:50:27Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z) - Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving [3.770103075126785]
We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation.
We present a Sensor-Agnostic Graph-Aware Kalman Filter, the first online state estimation technique designed to fuse multi-modal graphs.
We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets.
arXiv Detail & Related papers (2024-11-06T06:58:17Z) - Optical Flow Matters: an Empirical Comparative Study on Fusing Monocular Extracted Modalities for Better Steering [37.46760714516923]
This research introduces a new end-to-end method that exploits multimodal information from a single monocular camera to improve the steering predictions for self-driving cars.
By focusing on the fusion of RGB imagery with depth completion information or optical flow data, we propose a framework that integrates these modalities through both early and hybrid fusion techniques.
arXiv Detail & Related papers (2024-09-18T09:36:24Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - Efficient and Robust LiDAR-Based End-to-End Navigation [132.52661670308606]
We present an efficient and robust LiDAR-based end-to-end navigation framework.
We propose Fast-LiDARNet that is based on sparse convolution kernel optimization and hardware-aware model design.
We then propose Hybrid Evidential Fusion that directly estimates the uncertainty of the prediction from only a single forward pass.
arXiv Detail & Related papers (2021-05-20T17:52:37Z) - A Driving Behavior Recognition Model with Bi-LSTM and Multi-Scale CNN [59.57221522897815]
We propose a neural network model based on trajectories information for driving behavior recognition.
We evaluate the proposed model on the public BLVD dataset, achieving a satisfying performance.
arXiv Detail & Related papers (2021-03-01T06:47:29Z) - Learning Selective Sensor Fusion for States Estimation [47.76590539558037]
We propose SelectFusion, an end-to-end selective sensor fusion module.
During prediction, the network is able to assess the reliability of the latent features from different sensor modalities.
We extensively evaluate all fusion strategies in both public datasets and on progressively degraded datasets.
arXiv Detail & Related papers (2019-12-30T20:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.