Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis
- URL: http://arxiv.org/abs/2508.04999v1
- Date: Thu, 07 Aug 2025 03:24:04 GMT
- Title: Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis
- Authors: Menghua Jiang, Yuxia Lin, Baoliang Chen, Haifeng Hu, Yuncheng Jiang, Sijie Mai,
- Abstract summary: Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating information from multiple modalities, such as text, audio, and visual data.<n>Existing methods often suffer from spurious correlations both within and across modalities, leading models to rely on statistical shortcuts rather than true causal relationships.<n>We propose a Multi-relational Multimodal Causal Intervention (MMCI) model, which leverages the backdoor adjustment from causal theory to address the confounding effects of such shortcuts.
- Score: 25.791796193062012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating information from multiple modalities, such as text, audio, and visual data. However, existing methods often suffer from spurious correlations both within and across modalities, leading models to rely on statistical shortcuts rather than true causal relationships, thereby undermining generalization. To mitigate this issue, we propose a Multi-relational Multimodal Causal Intervention (MMCI) model, which leverages the backdoor adjustment from causal theory to address the confounding effects of such shortcuts. Specifically, we first model the multimodal inputs as a multi-relational graph to explicitly capture intra- and inter-modal dependencies. Then, we apply an attention mechanism to separately estimate and disentangle the causal features and shortcut features corresponding to these intra- and inter-modal relations. Finally, by applying the backdoor adjustment, we stratify the shortcut features and dynamically combine them with the causal features to encourage MMCI to produce stable predictions under distribution shifts. Extensive experiments on several standard MSA datasets and out-of-distribution (OOD) test sets demonstrate that our method effectively suppresses biases and improves performance.
Related papers
- The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models [31.81567038783558]
Multimodal Reward Models (MM-RMs) are crucial for aligning Large Language Models (LLMs) with human preferences.<n> MM-RMs often struggle to generalize to out-of-distribution data due to their reliance on unimodal spurious correlations.<n>We introduce a Shortcut-aware MM-RM learning algorithm that mitigates this issue by dynamically reweighting training samples.
arXiv Detail & Related papers (2025-03-05T02:37:41Z) - Online Multi-modal Root Cause Analysis [61.94987309148539]
Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems.
Existing online RCA methods handle only single-modal data overlooking, complex interactions in multi-modal systems.
We introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization.
arXiv Detail & Related papers (2024-10-13T21:47:36Z) - CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis [2.3522423517057143]
We propose a two-stage semi-supervised model termed Correlation-aware Multimodal Transformer (CorMulT)<n>At the pre-training stage, a modality correlation contrastive learning module is designed to efficiently learn modality correlation coefficients between different modalities.<n>At the prediction stage, the learned correlation coefficients are fused with modality representations to make the sentiment prediction.
arXiv Detail & Related papers (2024-07-09T17:07:29Z) - Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking [20.60198596317328]
Multimodal Entity Linking aims to link ambiguous mentions in multimodal contexts to entities in a multimodal knowledge graph.
Existing methods attempt several local correlative mechanisms, relying heavily on the automatically learned attention weights.
We propose a novel MEL framework, namely OT-MEL, with OT-guided correlation assignment.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [104.60508550106618]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)<n>We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.<n>We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Adaptive Contrastive Learning on Multimodal Transformer for Review
Helpfulness Predictions [40.70793282367128]
We propose Multimodal Contrastive Learning for Multimodal Review Helpfulness Prediction (MRHP) problem.
In addition, we introduce Adaptive Weighting scheme for our contrastive learning approach.
Finally, we propose Multimodal Interaction module to address the unalignment nature of multimodal data.
arXiv Detail & Related papers (2022-11-07T13:05:56Z) - Correlation Information Bottleneck: Towards Adapting Pretrained
Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations.
We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z) - Counterfactual Reasoning for Out-of-distribution Multimodal Sentiment
Analysis [56.84237932819403]
This paper aims to estimate and mitigate the bad effect of textual modality for strong OOD generalization.
Inspired by this, we devise a model-agnostic counterfactual framework for multimodal sentiment analysis.
arXiv Detail & Related papers (2022-07-24T03:57:40Z) - Learning Multimodal VAEs through Mutual Supervision [72.77685889312889]
MEME combines information between modalities implicitly through mutual supervision.
We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes.
arXiv Detail & Related papers (2021-06-23T17:54:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.