Investigating Vulnerability to Adversarial Examples on Multimodal Data
Fusion in Deep Learning
- URL: http://arxiv.org/abs/2005.10987v1
- Date: Fri, 22 May 2020 03:45:06 GMT
- Title: Investigating Vulnerability to Adversarial Examples on Multimodal Data
Fusion in Deep Learning
- Authors: Youngjoon Yu, Hong Joo Lee, Byeong Cheon Kim, Jung Uk Kim, Yong Man Ro
- Abstract summary: We investigated whether the current multimodal fusion model utilizes the complementary intelligence to defend against adversarial attacks.
We verified that the multimodal fusion model optimized for better prediction is still vulnerable to adversarial attack, even if only one of the sensors is attacked.
- Score: 32.125310341415755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of multimodal data fusion in deep learning appears to be
attributed to the use of complementary in-formation between multiple input
data. Compared to their predictive performance, relatively less attention has
been devoted to the robustness of multimodal fusion models. In this paper, we
investigated whether the current multimodal fusion model utilizes the
complementary intelligence to defend against adversarial attacks. We applied
gradient based white-box attacks such as FGSM and PGD on MFNet, which is a
major multispectral (RGB, Thermal) fusion deep learning model for semantic
segmentation. We verified that the multimodal fusion model optimized for better
prediction is still vulnerable to adversarial attack, even if only one of the
sensors is attacked. Thus, it is hard to say that existing multimodal data
fusion models are fully utilizing complementary relationships between multiple
modalities in terms of adversarial robustness. We believe that our observations
open a new horizon for adversarial attack research on multimodal data fusion.
Related papers
- MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation [13.624431305114564]
This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection.
Experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements.
Our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy.
arXiv Detail & Related papers (2024-10-11T11:58:35Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - Informative Data Selection with Uncertainty for Multi-modal Object
Detection [25.602915381482468]
We propose a universal uncertainty-aware multi-modal fusion model.
Our model reduces the randomness in fusion and generates reliable output.
Our fusion model is proven to resist severe noise interference like Gaussian, motion blur, and frost, with only slight degradation.
arXiv Detail & Related papers (2023-04-23T16:36:13Z) - Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content
Dilutions [27.983902791798965]
We develop a model that generates dilution text that maintains relevance and topical coherence with the image and existing text.
We find that the performance of task-specific fusion-based multimodal classifiers drops by 23.3% and 22.5%, respectively, in the presence of dilutions generated by our model.
Our work aims to highlight and encourage further research on the robustness of deep multimodal models to realistic variations.
arXiv Detail & Related papers (2022-11-04T17:58:02Z) - Understanding and Measuring Robustness of Multimodal Learning [14.257147031953211]
We introduce a comprehensive measurement of the adversarial robustness of multimodal learning via a framework called MUROAN.
We first present a unified view of multimodal models in MUROAN and identify the fusion mechanism of multimodal models as a key vulnerability.
We then introduce a new type of multimodal adversarial attacks called decoupling attack in MUROAN that aims to compromise multimodal models.
arXiv Detail & Related papers (2021-12-22T21:10:02Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Multimodal Object Detection via Bayesian Fusion [59.31437166291557]
We study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination.
Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities.
We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data.
arXiv Detail & Related papers (2021-04-07T04:03:20Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.