Provable Dynamic Fusion for Low-Quality Multimodal Data
- URL: http://arxiv.org/abs/2306.02050v2
- Date: Tue, 6 Jun 2023 13:46:22 GMT
- Title: Provable Dynamic Fusion for Low-Quality Multimodal Data
- Authors: Qingyang Zhang, Haitao Wu, Changqing Zhang, Qinghua Hu, Huazhu Fu,
Joey Tianyi Zhou, Xi Peng
- Abstract summary: Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
- Score: 94.39538027450948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The inherent challenge of multimodal fusion is to precisely capture the
cross-modal correlation and flexibly conduct cross-modal interaction. To fully
release the value of each modality and mitigate the influence of low-quality
multimodal data, dynamic multimodal fusion emerges as a promising learning
paradigm. Despite its widespread use, theoretical justifications in this field
are still notably lacking. Can we design a provably robust multimodal fusion
method? This paper provides theoretical understandings to answer this question
under a most popular multimodal fusion framework from the generalization
perspective. We proceed to reveal that several uncertainty estimation solutions
are naturally available to achieve robust multimodal fusion. Then a novel
multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is
proposed, which can improve the performance in terms of classification accuracy
and model robustness. Extensive experimental results on multiple benchmarks can
support our findings.
Related papers
- Predictive Dynamic Fusion [45.551196908423606]
We propose a Predictive Dynamic Fusion (PDF) framework for multimodal learning.
We derive the predictable Collaborative Belief (Co-Belief) with Mono- and Holo-Confidence, which provably reduces the upper bound of generalization error.
arXiv Detail & Related papers (2024-06-07T10:06:13Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Multimodal Fusion on Low-quality Data: A Comprehensive Survey [110.22752954128738]
This paper surveys the common challenges and recent advances of multimodal fusion in the wild.
We identify four main challenges that are faced by multimodal fusion on low-quality data.
This new taxonomy will enable researchers to understand the state of the field and identify several potential directions.
arXiv Detail & Related papers (2024-04-27T07:22:28Z) - Quantifying and Enhancing Multi-modal Robustness with Modality Preference [9.367733452960492]
Multi-modal models are vulnerable to pervasive perturbations, such as uni-modal attacks and missing conditions.
Larger uni-modal representation margins and more reliable integration for modalities are essential components for achieving higher robustness.
Inspired by our theoretical finding, we introduce a training procedure called Certifiable Robust Multi-modal Training.
arXiv Detail & Related papers (2024-02-09T08:33:48Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
Analysis in Videos [58.93586436289648]
We propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis.
Our model outperforms existing approaches on unaligned multimodal sequences and has strong performance on aligned multimodal sequences.
arXiv Detail & Related papers (2022-06-16T07:47:57Z) - Understanding and Measuring Robustness of Multimodal Learning [14.257147031953211]
We introduce a comprehensive measurement of the adversarial robustness of multimodal learning via a framework called MUROAN.
We first present a unified view of multimodal models in MUROAN and identify the fusion mechanism of multimodal models as a key vulnerability.
We then introduce a new type of multimodal adversarial attacks called decoupling attack in MUROAN that aims to compromise multimodal models.
arXiv Detail & Related papers (2021-12-22T21:10:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.