Related papers: On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving

On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving

URL: http://arxiv.org/abs/2507.09095v1
Date: Sat, 12 Jul 2025 00:44:26 GMT
Title: On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving
Authors: Md Hasan Shahriar, Md Mohaimin Al Barat, Harshavardhan Sundar, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou,
Abstract summary: We introduce DejaVu, a novel attack that exploits network-induced delays to create subtle temporal misalignments across sensor streams.<n>With a single-frame LiDAR delay, an attacker can reduce the car detection mAP by up to 88.5%, while with a three-frame camera delay, multiple object tracking accuracy (MOTA) for car drops by 73%.<n>We propose AION, a patch that can work alongside the existing perception model to monitor temporal alignment through cross-modal temporal consistency.
Score: 26.809693071623272
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal fusion (MMF) plays a critical role in the perception of autonomous driving, which primarily fuses camera and LiDAR streams for a comprehensive and efficient scene understanding. However, its strict reliance on precise temporal synchronization exposes it to new vulnerabilities. In this paper, we introduce DejaVu, a novel attack that exploits network-induced delays to create subtle temporal misalignments across sensor streams, severely degrading downstream MMF-based perception tasks. Our comprehensive attack analysis across different models and datasets reveals these sensors' task-specific imbalanced sensitivities: object detection is overly dependent on LiDAR inputs while object tracking is highly reliant on the camera inputs. Consequently, with a single-frame LiDAR delay, an attacker can reduce the car detection mAP by up to 88.5%, while with a three-frame camera delay, multiple object tracking accuracy (MOTA) for car drops by 73%. To detect such attacks, we propose AION, a defense patch that can work alongside the existing perception model to monitor temporal alignment through cross-modal temporal consistency. AION leverages multimodal shared representation learning and dynamic time warping to determine the path of temporal alignment and calculate anomaly scores based on the alignment. Our thorough evaluation of AION shows it achieves AUROC scores of 0.92-0.98 with low false positives across datasets and model architectures, demonstrating it as a robust and generalized defense against the temporal misalignment attacks.

Related papers

Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving [55.96227460521096]
Vision-Language Models (VLMs) have been integrated into autonomous driving systems to enhance reasoning capabilities.<n>We propose a natural reflection-based backdoor attack targeting VLM systems in autonomous driving scenarios.<n>Our findings uncover a new class of attacks that exploit the stringent real-time requirements of autonomous driving.
arXiv Detail & Related papers (2025-05-09T20:28:17Z)
StreamLTS: Query-based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection [0.552480439325792]
We propose Time-Aligned COoperative Object Detection (TA-COOD), for which we adapt widely used dataset OPV2V and DairV2X. Experiment results confirm the superior efficiency of our fully sparse framework compared to the state-of-the-art dense models.
arXiv Detail & Related papers (2024-07-04T10:56:10Z)
Ego-Motion Aware Target Prediction Module for Robust Multi-Object Tracking [2.7898966850590625]
We introduce a novel KF-based prediction module called Ego-motion Aware Target Prediction (EMAP) Our proposed method decouples the impact of camera rotational and translational velocity from the object trajectories by reformulating the Kalman Filter. EMAP remarkably drops the number of identity switches (IDSW) of OC-SORT and Deep OC-SORT by 73% and 21%, respectively.
arXiv Detail & Related papers (2024-04-03T23:24:25Z)
ADoPT: LiDAR Spoofing Attack Detection Based on Point-Level Temporal Consistency [11.160041268858773]
Deep neural networks (DNNs) are increasingly integrated into LiDAR-based perception systems for autonomous vehicles (AVs) We aim to address the challenge of LiDAR spoofing attacks, where attackers inject fake objects into LiDAR data and fool AVs to misinterpret their environment and make erroneous decisions. We propose ADoPT (Anomaly Detection based on Point-level Temporal consistency), which quantitatively measures temporal consistency across consecutive frames and identifies abnormal objects based on the coherency of point clusters. In our evaluation using the nuScenes dataset, our algorithm effectively counters various LiDAR spoofing attacks, achieving a low (
arXiv Detail & Related papers (2023-10-23T02:31:31Z)
Real-Time Driver Monitoring Systems through Modality and View Analysis [28.18784311981388]
Driver distractions are known to be the dominant cause of road accidents. State-of-the-art methods prioritize accuracy while ignoring latency. We propose time-effective detection models by neglecting the temporal relation between video frames.
arXiv Detail & Related papers (2022-10-17T21:22:41Z)
Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D. At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules. With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z)
StreamYOLO: Real-time Object Detection for Streaming Perception [84.2559631820007]
We endow the models with the capacity of predicting the future, significantly improving the results for streaming perception. We consider multiple velocities driving scene and propose Velocity-awared streaming AP (VsAP) to jointly evaluate the accuracy. Our simple method achieves the state-of-the-art performance on Argoverse-HD dataset and improves the sAP and VsAP by 4.7% and 8.2% respectively.
arXiv Detail & Related papers (2022-07-21T12:03:02Z)
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR. fusing these two modalities can significantly boost the performance of 3D perception models. We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z)
DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE) It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase. Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z)
Streaming Object Detection for 3-D Point Clouds [29.465873948076766]
LiDAR provides a prominent sensory modality that informs many existing perceptual systems. The latency for perceptual systems based on point cloud data can be dominated by the amount of time for a complete rotational scan. We show how operating on LiDAR data in its native streaming formulation offers several advantages for self driving object detection.
arXiv Detail & Related papers (2020-05-04T21:55:15Z)
Physically Realizable Adversarial Examples for LiDAR Object Detection [72.0017682322147]
We present a method to generate universal 3D adversarial objects to fool LiDAR detectors. In particular, we demonstrate that placing an adversarial object on the rooftop of any target vehicle to hide the vehicle entirely from LiDAR detectors with a success rate of 80%. This is one step closer towards safer self-driving under unseen conditions from limited training data.
arXiv Detail & Related papers (2020-04-01T16:11:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.