Missing Modality meets Meta Sampling (M3S): An Efficient Universal
Approach for Multimodal Sentiment Analysis with Missing Modality
- URL: http://arxiv.org/abs/2210.03428v1
- Date: Fri, 7 Oct 2022 09:54:05 GMT
- Title: Missing Modality meets Meta Sampling (M3S): An Efficient Universal
Approach for Multimodal Sentiment Analysis with Missing Modality
- Authors: Haozhe Chi, Minghua Yang, Junhao Zhu, Guanhong Wang, Gaoang Wang
- Abstract summary: We propose a simple yet effective meta-sampling approach for multimodal sentiment analysis with missing modalities, namely Missing Modality-based Meta Sampling (M3S)
M3S formulates a missing modality sampling strategy into the modal agnostic meta-learning framework.
We conduct experiments on IEMOCAP, SIMS and CMU-MOSI datasets, and superior performance is achieved compared with recent state-of-the-art methods.
- Score: 5.171058506312429
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal sentiment analysis (MSA) is an important way of observing mental
activities with the help of data captured from multiple modalities. However,
due to the recording or transmission error, some modalities may include
incomplete data. Most existing works that address missing modalities usually
assume a particular modality is completely missing and seldom consider a
mixture of missing across multiple modalities. In this paper, we propose a
simple yet effective meta-sampling approach for multimodal sentiment analysis
with missing modalities, namely Missing Modality-based Meta Sampling (M3S). To
be specific, M3S formulates a missing modality sampling strategy into the modal
agnostic meta-learning (MAML) framework. M3S can be treated as an efficient
add-on training component on existing models and significantly improve their
performances on multimodal data with a mixture of missing modalities. We
conduct experiments on IEMOCAP, SIMS and CMU-MOSI datasets, and superior
performance is achieved compared with recent state-of-the-art methods.
Related papers
- MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching [54.740256498985026]
Keypoint detection and description methods often struggle with multimodal data.
We propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching.
arXiv Detail & Related papers (2025-01-20T06:56:30Z) - PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning [42.00851701431368]
Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs.
A critical challenge remains: the issue of missing modalities during incremental learning phases.
We propose PAL, a novel exemplar-free framework tailored to MMCIL under missing-modality scenarios.
arXiv Detail & Related papers (2025-01-16T08:04:04Z) - Modality-Invariant Bidirectional Temporal Representation Distillation Network for Missing Multimodal Sentiment Analysis [6.15602203132432]
We introduce the Modality-Invariant Bidirectional Temporal Representation Distillation Network (MITR-DNet) for Missing Multimodal Sentiment Analysis.
MITR-DNet employs a distillation approach, wherein a complete modality teacher model guides a missing modality student model, ensuring robustness in the presence of modality missing.
arXiv Detail & Related papers (2025-01-07T07:57:16Z) - SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing.
It is designed to accurately detect horizontal or oriented objects from any sensor modality.
This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z) - Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent.
Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z) - Toward Robust Multimodal Learning using Multimodal Foundational Models [30.755818450393637]
We propose TRML, Toward Robust Multimodal Learning using Multimodal Foundational Models.
TRML employs generated virtual modalities to replace missing modalities.
We also design a semantic matching learning module to align semantic spaces generated and missing modalities.
arXiv Detail & Related papers (2024-01-20T04:46:43Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.