Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D
Object Detection
- URL: http://arxiv.org/abs/2304.14614v3
- Date: Sat, 2 Mar 2024 17:56:07 GMT
- Title: Fusion is Not Enough: Single Modal Attacks on Fusion Models for 3D
Object Detection
- Authors: Zhiyuan Cheng, Hongjun Choi, James Liang, Shiwei Feng, Guanhong Tao,
Dongfang Liu, Michael Zuzak, Xiangyu Zhang
- Abstract summary: We propose an attack framework that targets advanced camera-LiDAR fusion-based 3D object detection models through camera-only adversarial attacks.
Our approach employs a two-stage optimization-based strategy that first thoroughly evaluates vulnerable image areas under adversarial attacks.
- Score: 33.0406308223244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-sensor fusion (MSF) is widely used in autonomous vehicles (AVs) for
perception, particularly for 3D object detection with camera and LiDAR sensors.
The purpose of fusion is to capitalize on the advantages of each modality while
minimizing its weaknesses. Advanced deep neural network (DNN)-based fusion
techniques have demonstrated the exceptional and industry-leading performance.
Due to the redundant information in multiple modalities, MSF is also recognized
as a general defence strategy against adversarial attacks. In this paper, we
attack fusion models from the camera modality that is considered to be of
lesser importance in fusion but is more affordable for attackers. We argue that
the weakest link of fusion models depends on their most vulnerable modality,
and propose an attack framework that targets advanced camera-LiDAR fusion-based
3D object detection models through camera-only adversarial attacks. Our
approach employs a two-stage optimization-based strategy that first thoroughly
evaluates vulnerable image areas under adversarial attacks, and then applies
dedicated attack strategies for different fusion models to generate deployable
patches. The evaluations with six advanced camera-LiDAR fusion models and one
camera-only model indicate that our attacks successfully compromise all of
them. Our approach can either decrease the mean average precision (mAP) of
detection performance from 0.824 to 0.353, or degrade the detection score of a
target object from 0.728 to 0.156, demonstrating the efficacy of our proposed
attack framework. Code is available.
Related papers
- FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving [63.96049803915402]
The integration of data from diverse sensor modalities constitutes a prevalent methodology within the ambit of autonomous driving scenarios.
Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats.
In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion.
arXiv Detail & Related papers (2024-08-13T11:46:32Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection.
For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features.
For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z) - Sparse Dense Fusion for 3D Object Detection [24.288662560178334]
Camera-LiDAR fusion has gained popularity in 3D object detection.
We analyze two challenges: 1) sparse-only solutions preserve 3D geometric prior and yet lose rich semantic information from the camera, and 2) dense-only alternatives retain the semantic continuity but miss the accurate geometric information from LiDAR.
We propose Sparse Dense Fusion (SDF), a complementary framework that incorporates both sparse-fusion and dense-fusion modules via the Transformer architecture.
arXiv Detail & Related papers (2023-04-09T07:10:34Z) - CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion
Transformer [14.849645397321185]
Camera radar sensors have significant advantages in cost, reliability, and maintenance compared to LiDAR.
Existing fusion methods often fuse the outputs of single modalities at the result-level, called the late fusion strategy.
Here we propose a novel proposal-level early fusion approach that effectively exploits both spatial and contextual properties of camera and radar for 3D object detection.
Our camera-radar fusion approach achieves the state-of-the-art 41.1% mAP and 52.3% NDS on the nuScenes test set, which is 8.7 and 10.8 points higher than the camera-only baseline, as well as yielding competitive performance on the
arXiv Detail & Related papers (2022-09-14T10:25:30Z) - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth
Seeds for 3D Object Detection [89.26380781863665]
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems.
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images into 3D space for fusion.
We propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features.
arXiv Detail & Related papers (2022-09-07T12:29:29Z) - Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object
Detection [58.81316192862618]
Two critical sensors for 3D perception in autonomous driving are the camera and the LiDAR.
fusing these two modalities can significantly boost the performance of 3D perception models.
We benchmark the state-of-the-art fusion methods for the first time.
arXiv Detail & Related papers (2022-05-30T09:35:37Z) - Sensor Adversarial Traits: Analyzing Robustness of 3D Object Detection
Sensor Fusion Models [16.823829387723524]
We analyze the robustness of a high-performance, open source sensor fusion model architecture towards adversarial attacks.
We find that despite the use of a LIDAR sensor, the model is vulnerable to our purposefully crafted image-based adversarial attacks.
arXiv Detail & Related papers (2021-09-13T23:38:42Z) - Security Analysis of Camera-LiDAR Semantic-Level Fusion Against
Black-Box Attacks on Autonomous Vehicles [6.477833151094911]
Recently, it was shown that LiDAR-based perception built on deep neural networks is vulnerable to spoofing attacks.
We perform the first analysis of camera-LiDAR fusion under spoofing attacks and the first security analysis of semantic fusion in any AV context.
We find that semantic camera-LiDAR fusion exhibits widespread vulnerability to frustum attacks with between 70% and 90% success against target models.
arXiv Detail & Related papers (2021-06-13T21:59:19Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z) - Adversarial Attacks on Camera-LiDAR Models for 3D Car Detection [15.323682536206574]
Most autonomous vehicles rely on LiDAR and RGB camera sensors for perception.
Deep neural nets (DNNs) have achieved state-of-the-art performance in 3D detection.
We propose a universal and physically realizable adversarial attack for each type, and study and contrast their respective vulnerabilities to attacks.
arXiv Detail & Related papers (2021-03-17T05:24:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.