Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities
- URL: http://arxiv.org/abs/2210.15359v1
- Date: Thu, 27 Oct 2022 12:16:25 GMT
- Title: Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities
- Authors: Haolin Zuo, Rui Liu, Jinming Zhao, Guanglai Gao, Haizhou Li
- Abstract summary: We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
- Score: 76.08541852988536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal emotion recognition leverages complementary information across
modalities to gain performance. However, we cannot guarantee that the data of
all modalities are always present in practice. In the studies to predict the
missing data across modalities, the inherent difference between heterogeneous
modalities, namely the modality gap, presents a challenge. To address this, we
propose to use invariant features for a missing modality imagination network
(IF-MMIN) which includes two novel mechanisms: 1) an invariant feature learning
strategy that is based on the central moment discrepancy (CMD) distance under
the full-modality scenario; 2) an invariant feature based imagination module
(IF-IM) to alleviate the modality gap during the missing modalities prediction,
thus improving the robustness of multimodal joint representation. Comprehensive
experiments on the benchmark dataset IEMOCAP demonstrate that the proposed
model outperforms all baselines and invariantly improves the overall emotion
recognition performance under uncertain missing-modality conditions. We release
the code at: https://github.com/ZhuoYulang/IF-MMIN.
Related papers
- Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference [20.761803725098005]
Multimodal variational autoencoders (VAEs) aim to capture shared latent representations by integrating information from different data modalities.
A significant challenge is accurately inferring representations from any subset of modalities without training an impractical number of inference networks for all possible modality combinations.
We introduce multimodal iterative amortized inference, an iterative refinement mechanism within the multimodal VAE framework.
arXiv Detail & Related papers (2024-10-15T08:49:38Z) - Progressively Modality Freezing for Multi-Modal Entity Alignment [27.77877721548588]
We propose a novel strategy of progressive modality freezing, called PMF, that focuses on alignmentrelevant features.
Notably, our approach introduces a pioneering cross-modal association loss to foster modal consistency.
Empirical evaluations across nine datasets confirm PMF's superiority.
arXiv Detail & Related papers (2024-07-23T04:22:30Z) - Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities [8.517830626176641]
Any2Seg is a novel framework that can achieve robust segmentation from any combination of modalities in any visual conditions.
Experiments on two benchmarks with four modalities demonstrate that Any2Seg achieves the state-of-the-art under the multi-modal setting.
arXiv Detail & Related papers (2024-07-16T03:34:38Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation [37.06795681738417]
Modality-Incomplete Scene (MISS) is a task that encompasses both system-level modality absence and sensor-level modality errors.
We introduce a Missing-aware Modal Switch (MMS) strategy to proactively manage missing modalities during training.
We show an improvement of 5.84% mIoU over the prior state-of-the-art parameter-efficient methods in modality missing.
arXiv Detail & Related papers (2024-01-30T11:46:27Z) - Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent.
Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.