Which is Making the Contribution: Modulating Unimodal and Cross-modal
Dynamics for Multimodal Sentiment Analysis
- URL: http://arxiv.org/abs/2111.08451v1
- Date: Wed, 10 Nov 2021 03:29:17 GMT
- Title: Which is Making the Contribution: Modulating Unimodal and Cross-modal
Dynamics for Multimodal Sentiment Analysis
- Authors: Ying Zeng, Sijie Mai, Haifeng Hu
- Abstract summary: Multimodal sentiment analysis (MSA) draws increasing attention with the availability of multimodal data.
Recent MSA works mostly focus on learning cross-modal dynamics, but neglect to explore an optimal solution for unimodal networks.
We propose a novel MSA framework textbfModulation textbfModel for textbfMultimodal textbfSentiment textbfAnalysis.
- Score: 18.833050804875032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal sentiment analysis (MSA) draws increasing attention with the
availability of multimodal data. The boost in performance of MSA models is
mainly hindered by two problems. On the one hand, recent MSA works mostly focus
on learning cross-modal dynamics, but neglect to explore an optimal solution
for unimodal networks, which determines the lower limit of MSA models. On the
other hand, noisy information hidden in each modality interferes the learning
of correct cross-modal dynamics. To address the above-mentioned problems, we
propose a novel MSA framework \textbf{M}odulation \textbf{M}odel for
\textbf{M}ultimodal \textbf{S}entiment \textbf{A}nalysis ({$ M^3SA $}) to
identify the contribution of modalities and reduce the impact of noisy
information, so as to better learn unimodal and cross-modal dynamics.
Specifically, modulation loss is designed to modulate the loss contribution
based on the confidence of individual modalities in each utterance, so as to
explore an optimal update solution for each unimodal network. Besides, contrary
to most existing works which fail to explicitly filter out noisy information,
we devise a modality filter module to identify and filter out modality noise
for the learning of correct cross-modal embedding. Extensive experiments on
publicly datasets demonstrate that our approach achieves state-of-the-art
performance.
Related papers
- On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference [20.761803725098005]
Multimodal variational autoencoders (VAEs) aim to capture shared latent representations by integrating information from different data modalities.
A significant challenge is accurately inferring representations from any subset of modalities without training an impractical number of inference networks for all possible modality combinations.
We introduce multimodal iterative amortized inference, an iterative refinement mechanism within the multimodal VAE framework.
arXiv Detail & Related papers (2024-10-15T08:49:38Z) - Towards Robust Multimodal Sentiment Analysis with Incomplete Data [20.75292807497547]
We present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust Multimodal Sentiment Analysis (MSA)
LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios.
arXiv Detail & Related papers (2024-09-30T07:14:31Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - VERITE: A Robust Benchmark for Multimodal Misinformation Detection
Accounting for Unimodal Bias [17.107961913114778]
multimodal misinformation is a growing problem on social media platforms.
In this study, we investigate and identify the presence of unimodal bias in widely-used MMD benchmarks.
We introduce a new method -- termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating realistic synthetic training data.
arXiv Detail & Related papers (2023-04-27T12:28:29Z) - Multimodal Representations Learning Based on Mutual Information
Maximization and Minimization and Identity Embedding for Multimodal Sentiment
Analysis [33.73730195500633]
We propose a multimodal representation model based on Mutual information Maximization and Identity Embedding.
Experimental results on two public datasets demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-01-10T01:41:39Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.