Robust Multimodal Sentiment Analysis via Double Information Bottleneck
- URL: http://arxiv.org/abs/2511.01444v1
- Date: Mon, 03 Nov 2025 10:52:45 GMT
- Title: Robust Multimodal Sentiment Analysis via Double Information Bottleneck
- Authors: Huiting Huang, Tieliang Gong, Kai He, Jialun Wu, Erik Cambria, Mengling Feng,
- Abstract summary: Multimodal sentiment analysis has received significant attention across diverse research domains.<n>Existing approaches suffer from insufficient learning of noise-contaminated unimodal data.<n>This paper proposes a Double Information Bottleneck (DIB) strategy to obtain a powerful, unified compact multimodal representation.
- Score: 55.32835720742616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal sentiment analysis has received significant attention across diverse research domains. Despite advancements in algorithm design, existing approaches suffer from two critical limitations: insufficient learning of noise-contaminated unimodal data, leading to corrupted cross-modal interactions, and inadequate fusion of multimodal representations, resulting in discarding discriminative unimodal information while retaining multimodal redundant information. To address these challenges, this paper proposes a Double Information Bottleneck (DIB) strategy to obtain a powerful, unified compact multimodal representation. Implemented within the framework of low-rank Renyi's entropy functional, DIB offers enhanced robustness against diverse noise sources and computational tractability for high-dimensional data, as compared to the conventional Shannon entropy-based methods. The DIB comprises two key modules: 1) learning a sufficient and compressed representation of individual unimodal data by maximizing the task-relevant information and discarding the superfluous information, and 2) ensuring the discriminative ability of multimodal representation through a novel attention bottleneck fusion mechanism. Consequently, DIB yields a multimodal representation that effectively filters out noisy information from unimodal data while capturing inter-modal complementarity. Extensive experiments on CMU-MOSI, CMU-MOSEI, CH-SIMS, and MVSA-Single validate the effectiveness of our method. The model achieves 47.4% accuracy under the Acc-7 metric on CMU-MOSI and 81.63% F1-score on CH-SIMS, outperforming the second-best baseline by 1.19%. Under noise, it shows only 0.36% and 0.29% performance degradation on CMU-MOSI and CMU-MOSEI respectively.
Related papers
- Multimodal Representation-disentangled Information Bottleneck for Multimodal Recommendation [36.338586087343806]
We propose a novel framework, the Multimodal Representation-disentangled Information Bottleneck (MRdIB)<n>Concretely, we first employ a Multimodal Information Bottleneck to compress the input representations.<n>Then, we decompose the information based on its relationship with the recommendation target into unique, redundant, and synergistic components.
arXiv Detail & Related papers (2025-09-24T15:18:32Z) - Multi-Modal Dataset Distillation in the Wild [75.64263877043615]
We propose Multi-modal dataset Distillation in the Wild, i.e., MDW, to distill noisy multi-modal datasets into compact clean ones for effective and efficient model training.<n>Specifically, MDW introduces learnable fine-grained correspondences during distillation and adaptively optimize distilled data to emphasize correspondence-discriminative regions.<n>Extensive experiments validate MDW's theoretical and empirical efficacy with remarkable scalability, surpassing prior methods by over 15% across various compression ratios.
arXiv Detail & Related papers (2025-06-02T12:18:20Z) - Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness [61.87055159919641]
Multi-modal semantic segmentation (MMSS) addresses the limitations of single-modality data by integrating complementary information across modalities.<n>Despite notable progress, a significant gap persists between research and real-world deployment due to variability and uncertainty in multi-modal data quality.<n>We introduce a robustness benchmark that evaluates MMSS models under three scenarios: Entire-Missing Modality (EMM), Random-Missing Modality (RMM), and Noisy Modality (NM)
arXiv Detail & Related papers (2025-03-24T08:46:52Z) - Dynamic Multimodal Information Bottleneck for Multimodality
Classification [26.65073424377933]
We propose a dynamic multimodal information bottleneck framework for attaining a robust fused feature representation.
Specifically, our information bottleneck module serves to filter out the task-irrelevant information and noises in the fused feature.
Our method surpasses the state-of-the-art and is significantly more robust, being the only method to remain performance when large-scale noisy channels exist.
arXiv Detail & Related papers (2023-11-02T08:34:08Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal
and Multimodal Representations [27.855467591358018]
We introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation.
We develop three MIB variants, namely, early-fusion MIB, late-fusion MIB, and complete MIB, to focus on different perspectives of information constraints.
Experimental results suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis and multimodal emotion recognition.
arXiv Detail & Related papers (2022-10-31T16:14:18Z) - Multi-Modal Mutual Information Maximization: A Novel Approach for
Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH)
We learn informative representations that can preserve both intra- and inter-modal similarities.
The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z) - Which is Making the Contribution: Modulating Unimodal and Cross-modal
Dynamics for Multimodal Sentiment Analysis [18.833050804875032]
Multimodal sentiment analysis (MSA) draws increasing attention with the availability of multimodal data.
Recent MSA works mostly focus on learning cross-modal dynamics, but neglect to explore an optimal solution for unimodal networks.
We propose a novel MSA framework textbfModulation textbfModel for textbfMultimodal textbfSentiment textbfAnalysis.
arXiv Detail & Related papers (2021-11-10T03:29:17Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.