Which is Making the Contribution: Modulating Unimodal and Cross-modal
Dynamics for Multimodal Sentiment Analysis
- URL: http://arxiv.org/abs/2111.08451v1
- Date: Wed, 10 Nov 2021 03:29:17 GMT
- Title: Which is Making the Contribution: Modulating Unimodal and Cross-modal
Dynamics for Multimodal Sentiment Analysis
- Authors: Ying Zeng, Sijie Mai, Haifeng Hu
- Abstract summary: Multimodal sentiment analysis (MSA) draws increasing attention with the availability of multimodal data.
Recent MSA works mostly focus on learning cross-modal dynamics, but neglect to explore an optimal solution for unimodal networks.
We propose a novel MSA framework textbfModulation textbfModel for textbfMultimodal textbfSentiment textbfAnalysis.
- Score: 18.833050804875032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal sentiment analysis (MSA) draws increasing attention with the
availability of multimodal data. The boost in performance of MSA models is
mainly hindered by two problems. On the one hand, recent MSA works mostly focus
on learning cross-modal dynamics, but neglect to explore an optimal solution
for unimodal networks, which determines the lower limit of MSA models. On the
other hand, noisy information hidden in each modality interferes the learning
of correct cross-modal dynamics. To address the above-mentioned problems, we
propose a novel MSA framework \textbf{M}odulation \textbf{M}odel for
\textbf{M}ultimodal \textbf{S}entiment \textbf{A}nalysis ({$ M^3SA $}) to
identify the contribution of modalities and reduce the impact of noisy
information, so as to better learn unimodal and cross-modal dynamics.
Specifically, modulation loss is designed to modulate the loss contribution
based on the confidence of individual modalities in each utterance, so as to
explore an optimal update solution for each unimodal network. Besides, contrary
to most existing works which fail to explicitly filter out noisy information,
we devise a modality filter module to identify and filter out modality noise
for the learning of correct cross-modal embedding. Extensive experiments on
publicly datasets demonstrate that our approach achieves state-of-the-art
performance.
Related papers
- PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning [42.00851701431368]
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs.
A critical challenge remains: the issue of missing modalities during incremental learning phases.
We propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios.
arXiv Detail & Related papers (2025-01-16T08:04:04Z) - Asymmetric Reinforcing against Multi-modal Representation Bias [59.685072206359855]
We propose an Asymmetric Reinforcing method against Multimodal representation bias (ARM)
Our ARM dynamically reinforces the weak modalities while maintaining the ability to represent dominant modalities through conditional mutual information.
We have significantly improved the performance of multimodal learning, making notable progress in mitigating imbalanced multimodal learning.
arXiv Detail & Related papers (2025-01-02T13:00:06Z) - Spectrum-based Modality Representation Fusion Graph Convolutional Network for Multimodal Recommendation [7.627299398469962]
We propose a new Spectrum-based Modality Representation graph recommender.
It aims to capture both uni-modal and fusion preferences while simultaneously suppressing modality noise.
Experiments on three real-world datasets show the efficacy of our proposed model.
arXiv Detail & Related papers (2024-12-19T15:53:21Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Towards Robust Multimodal Sentiment Analysis with Incomplete Data [20.75292807497547]
We present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust Multimodal Sentiment Analysis (MSA)
LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios.
arXiv Detail & Related papers (2024-09-30T07:14:31Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Multimodal Representations Learning Based on Mutual Information
Maximization and Minimization and Identity Embedding for Multimodal Sentiment
Analysis [33.73730195500633]
We propose a multimodal representation model based on Mutual information Maximization and Identity Embedding.
Experimental results on two public datasets demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-01-10T01:41:39Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.