Understanding and Measuring Robustness of Multimodal Learning
- URL: http://arxiv.org/abs/2112.12792v2
- Date: Tue, 28 Dec 2021 16:32:30 GMT
- Title: Understanding and Measuring Robustness of Multimodal Learning
- Authors: Nishant Vishwamitra, Hongxin Hu, Ziming Zhao, Long Cheng and Feng Luo
- Abstract summary: We introduce a comprehensive measurement of the adversarial robustness of multimodal learning via a framework called MUROAN.
We first present a unified view of multimodal models in MUROAN and identify the fusion mechanism of multimodal models as a key vulnerability.
We then introduce a new type of multimodal adversarial attacks called decoupling attack in MUROAN that aims to compromise multimodal models.
- Score: 14.257147031953211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The modern digital world is increasingly becoming multimodal. Although
multimodal learning has recently revolutionized the state-of-the-art
performance in multimodal tasks, relatively little is known about the
robustness of multimodal learning in an adversarial setting. In this paper, we
introduce a comprehensive measurement of the adversarial robustness of
multimodal learning by focusing on the fusion of input modalities in multimodal
models, via a framework called MUROAN (MUltimodal RObustness ANalyzer). We
first present a unified view of multimodal models in MUROAN and identify the
fusion mechanism of multimodal models as a key vulnerability. We then introduce
a new type of multimodal adversarial attacks called decoupling attack in MUROAN
that aims to compromise multimodal models by decoupling their fused modalities.
We leverage the decoupling attack of MUROAN to measure several state-of-the-art
multimodal models and find that the multimodal fusion mechanism in all these
models is vulnerable to decoupling attacks. We especially demonstrate that, in
the worst case, the decoupling attack of MUROAN achieves an attack success rate
of 100% by decoupling just 1.16% of the input space. Finally, we show that
traditional adversarial training is insufficient to improve the robustness of
multimodal models with respect to decoupling attacks. We hope our findings
encourage researchers to pursue improving the robustness of multimodal
learning.
Related papers
- MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models [34.802736332993994]
We propose MMCert, the first certified defense against adversarial attacks to a multi-modal model.
We evaluate our MMCert using two benchmark datasets: one for the multi-modal road segmentation task and the other for the multi-modal emotion recognition task.
arXiv Detail & Related papers (2024-03-28T01:05:06Z) - Quantifying and Enhancing Multi-modal Robustness with Modality Preference [9.367733452960492]
Multi-modal models are vulnerable to pervasive perturbations, such as uni-modal attacks and missing conditions.
Larger uni-modal representation margins and more reliable integration for modalities are essential components for achieving higher robustness.
Inspired by our theoretical finding, we introduce a training procedure called Certifiable Robust Multi-modal Training.
arXiv Detail & Related papers (2024-02-09T08:33:48Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - UniS-MMC: Multimodal Classification via Unimodality-supervised
Multimodal Contrastive Learning [29.237813880311943]
We propose a novel multimodal contrastive method to explore more reliable multimodal representations under the weak supervision of unimodal predicting.
Experimental results with fused features on two image-text classification benchmarks show that our proposed Unimodality-Supervised MultiModal Contrastive UniS-MMC learning method outperforms current state-of-the-art multimodal methods.
arXiv Detail & Related papers (2023-05-16T09:18:38Z) - Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
Analysis in Videos [58.93586436289648]
We propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis.
Our model outperforms existing approaches on unaligned multimodal sequences and has strong performance on aligned multimodal sequences.
arXiv Detail & Related papers (2022-06-16T07:47:57Z) - Investigating Vulnerability to Adversarial Examples on Multimodal Data
Fusion in Deep Learning [32.125310341415755]
We investigated whether the current multimodal fusion model utilizes the complementary intelligence to defend against adversarial attacks.
We verified that the multimodal fusion model optimized for better prediction is still vulnerable to adversarial attack, even if only one of the sensors is attacked.
arXiv Detail & Related papers (2020-05-22T03:45:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.