Understanding and Measuring Robustness of Multimodal Learning
- URL: http://arxiv.org/abs/2112.12792v2
- Date: Tue, 28 Dec 2021 16:32:30 GMT
- Title: Understanding and Measuring Robustness of Multimodal Learning
- Authors: Nishant Vishwamitra, Hongxin Hu, Ziming Zhao, Long Cheng and Feng Luo
- Abstract summary: We introduce a comprehensive measurement of the adversarial robustness of multimodal learning via a framework called MUROAN.
We first present a unified view of multimodal models in MUROAN and identify the fusion mechanism of multimodal models as a key vulnerability.
We then introduce a new type of multimodal adversarial attacks called decoupling attack in MUROAN that aims to compromise multimodal models.
- Score: 14.257147031953211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The modern digital world is increasingly becoming multimodal. Although
multimodal learning has recently revolutionized the state-of-the-art
performance in multimodal tasks, relatively little is known about the
robustness of multimodal learning in an adversarial setting. In this paper, we
introduce a comprehensive measurement of the adversarial robustness of
multimodal learning by focusing on the fusion of input modalities in multimodal
models, via a framework called MUROAN (MUltimodal RObustness ANalyzer). We
first present a unified view of multimodal models in MUROAN and identify the
fusion mechanism of multimodal models as a key vulnerability. We then introduce
a new type of multimodal adversarial attacks called decoupling attack in MUROAN
that aims to compromise multimodal models by decoupling their fused modalities.
We leverage the decoupling attack of MUROAN to measure several state-of-the-art
multimodal models and find that the multimodal fusion mechanism in all these
models is vulnerable to decoupling attacks. We especially demonstrate that, in
the worst case, the decoupling attack of MUROAN achieves an attack success rate
of 100% by decoupling just 1.16% of the input space. Finally, we show that
traditional adversarial training is insufficient to improve the robustness of
multimodal models with respect to decoupling attacks. We hope our findings
encourage researchers to pursue improving the robustness of multimodal
learning.
Related papers
- MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.
MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.
We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - Modality Unified Attack for Omni-Modality Person Re-Identification [16.624135145315673]
We propose a novel Modality Unified Attack method to train adversarial generators to attack different omni-modality models.
Experiments show that our method can effectively attack the omni-modality re-id models, achieving 55.9%, 24.4%, 49.0% and 62.7% mean mAP Drop Rate.
arXiv Detail & Related papers (2025-01-22T09:54:43Z) - Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM)
Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information.
We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z) - MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models [34.802736332993994]
We propose MMCert, the first certified defense against adversarial attacks to a multi-modal model.
We evaluate our MMCert using two benchmark datasets: one for the multi-modal road segmentation task and the other for the multi-modal emotion recognition task.
arXiv Detail & Related papers (2024-03-28T01:05:06Z) - Quantifying and Enhancing Multi-modal Robustness with Modality Preference [9.367733452960492]
Multi-modal models are vulnerable to pervasive perturbations, such as uni-modal attacks and missing conditions.
Larger uni-modal representation margins and more reliable integration for modalities are essential components for achieving higher robustness.
Inspired by our theoretical finding, we introduce a training procedure called Certifiable Robust Multi-modal Training.
arXiv Detail & Related papers (2024-02-09T08:33:48Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - UniS-MMC: Multimodal Classification via Unimodality-supervised
Multimodal Contrastive Learning [29.237813880311943]
We propose a novel multimodal contrastive method to explore more reliable multimodal representations under the weak supervision of unimodal predicting.
Experimental results with fused features on two image-text classification benchmarks show that our proposed Unimodality-Supervised MultiModal Contrastive UniS-MMC learning method outperforms current state-of-the-art multimodal methods.
arXiv Detail & Related papers (2023-05-16T09:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.