Related papers: Confidence-Aware Self-Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

Confidence-Aware Self-Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

URL: http://arxiv.org/abs/2506.01490v1
Date: Mon, 02 Jun 2025 09:48:41 GMT
Title: Confidence-Aware Self-Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
Authors: Yanxi Luo, Shijin Wang, Zhongxing Xu, Yulong Li, Feilong Tang, Jionglong Su,
Abstract summary: Multimodal sentiment analysis aims to understand human sentiment through multimodal data.<n>Existing methods for handling modality missingness are based on data reconstruction or common subspace projections.<n>We propose a Confidence-Aware Self-Distillation (CASD) strategy that effectively incorporates multimodal probabilistic embeddings.
Score: 15.205192581534973
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. In real-world scenarios, practical factors often lead to uncertain modality missingness. Existing methods for handling modality missingness are based on data reconstruction or common subspace projections. However, these methods neglect the confidence in multimodal combinations and impose constraints on intra-class representation, hindering the capture of modality-specific information and resulting in suboptimal performance. To address these challenges, we propose a Confidence-Aware Self-Distillation (CASD) strategy that effectively incorporates multimodal probabilistic embeddings via a mixture of Student's $t$-distributions, enhancing its robustness by incorporating confidence and accommodating heavy-tailed properties. This strategy estimates joint distributions with uncertainty scores and reduces uncertainty in the student network by consistency distillation. Furthermore, we introduce a reparameterization representation module that facilitates CASD in robust multimodal learning by sampling embeddings from the joint distribution for the prediction module to calculate the task loss. As a result, the directional constraint from the loss minimization is alleviated by the sampled representation. Experimental results on three benchmark datasets demonstrate that our method achieves state-of-the-art performance.

Related papers

MIRRAMS: Towards Training Models Robust to Missingness Distribution Shifts [2.5357049657770516]
In real-world data analysis, missingness distributional shifts between training and test input datasets frequently occur.<n>We propose a novel deep learning framework designed to address such shifts in missingness distributions.<n>Our approach achieves state-of-the-art performance even without missing data and can be naturally extended to address semi-supervised learning tasks.
arXiv Detail & Related papers (2025-07-11T03:03:30Z)
Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition [7.25361375272096]
Multimodal multi-label emotion recognition aims to identify the concurrent presence of multiple emotions in multimodal data.<n>Existing studies overlook the impact of textbfaleatoric uncertainty, which is the inherent noise in the multimodal data.<n>This paper proposes Latent emotional Distribution Decomposition with Uncertainty perception framework.
arXiv Detail & Related papers (2025-02-19T18:53:23Z)
Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference [20.761803725098005]
Multimodal variational autoencoders (VAEs) aim to capture shared latent representations by integrating information from different data modalities. A significant challenge is accurately inferring representations from any subset of modalities without training an impractical number of inference networks for all possible modality combinations. We introduce multimodal iterative amortized inference, an iterative refinement mechanism within the multimodal VAE framework.
arXiv Detail & Related papers (2024-10-15T08:49:38Z)
Robust Multimodal Learning via Representation Decoupling [6.7678581401558295]
Multimodal learning has attracted increasing attention due to its practicality. Existing methods tend to address it by learning a common subspace representation for different modality combinations. We propose a novel Decoupled Multimodal Representation Network (DMRNet) to assist robust multimodal learning.
arXiv Detail & Related papers (2024-07-05T12:09:33Z)
Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities [16.69453837626083]
We propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the Multimodal Sentiment Analysis (MSA) task under uncertain missing modalities. We present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics. We design a response-disentangled consistency distillation strategy to optimize the sentiment decision boundaries of the student network.
arXiv Detail & Related papers (2024-04-25T09:35:09Z)
The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model. We introduce three robustness indicators and conduct experiments across diverse robust datasets. Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z)
Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z)
Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN) We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z)
Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result. Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.