Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios
- URL: http://arxiv.org/abs/2311.16114v2
- Date: Tue, 7 May 2024 16:30:05 GMT
- Title: Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios
- Authors: Qi Fan, Haolin Zuo, Rui Liu, Zheng Lian, Guanglai Gao,
- Abstract summary: Multimodal emotion recognition (MER) in practical scenarios is significantly challenged by the presence of missing or incomplete data.
Traditional methods have often involved discarding data or substituting data segments with zero vectors to approximate these incompletenesses.
We introduce a novel noise-robust MER model that effectively learns robust multimodal joint representations from noisy data.
- Score: 23.43319138048058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal emotion recognition (MER) in practical scenarios is significantly challenged by the presence of missing or incomplete data across different modalities. To overcome these challenges, researchers have aimed to simulate incomplete conditions during the training phase to enhance the system's overall robustness. Traditional methods have often involved discarding data or substituting data segments with zero vectors to approximate these incompletenesses. However, such approaches neither accurately represent real-world conditions nor adequately address the issue of noisy data availability. For instance, a blurry image cannot be simply replaced with zero vectors, and still retain information. To tackle this issue and develop a more precise MER system, we introduce a novel noise-robust MER model that effectively learns robust multimodal joint representations from noisy data. This approach includes two pivotal components: firstly, a noise scheduler that adjusts the type and level of noise in the data to emulate various realistic incomplete situations. Secondly, a Variational AutoEncoder (VAE)-based module is employed to reconstruct these robust multimodal joint representations from the noisy inputs. Notably, the introduction of the noise scheduler enables the exploration of an entirely new type of incomplete data condition, which is impossible with existing methods. Extensive experimental evaluations on the benchmark datasets IEMOCAP and CMU-MOSEI demonstrate the effectiveness of the noise scheduler and the excellent performance of our proposed model.
Related papers
- Robust Learning under Hybrid Noise [24.36707245704713]
We propose a novel unified learning framework called "Feature and Label Recovery" (FLR) to combat the hybrid noise from the perspective of data recovery.
arXiv Detail & Related papers (2024-07-04T16:13:25Z) - Relation Modeling and Distillation for Learning with Noisy Labels [4.556974104115929]
This paper proposes a relation modeling and distillation framework that models inter-sample relationships via self-supervised learning.
The proposed framework can learn discriminative representations for noisy data, which results in superior performance than the existing methods.
arXiv Detail & Related papers (2024-05-30T01:47:27Z) - NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition [3.726602636064681]
We present an analysis that shows that real noise is significantly more challenging than simulated noise.
We show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound.
arXiv Detail & Related papers (2024-05-13T10:20:31Z) - Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds [88.06545572893455]
In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent.
We first model the mixture of noise patterns by all annotators, and then transfer this modeling to individual annotators.
Experiments confirm the superiority of the proposed approach on synthetic and real-world crowd-sourcing data.
arXiv Detail & Related papers (2023-06-05T13:43:29Z) - Realistic Noise Synthesis with Diffusion Models [68.48859665320828]
Deep image denoising models often rely on large amount of training data for the high quality performance.
We propose a novel method that synthesizes realistic noise using diffusion models, namely Realistic Noise Synthesize Diffusor (RNSD)
RNSD can incorporate guided multiscale content, such as more realistic noise with spatial correlations can be generated at multiple frequencies.
arXiv Detail & Related papers (2023-05-23T12:56:01Z) - Confidence-based Reliable Learning under Dual Noises [46.45663546457154]
Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks.
Yet, the data collected from the open world are unavoidably polluted by noise, which may significantly undermine the efficacy of the learned models.
Various attempts have been made to reliably train DNNs under data noise, but they separately account for either the noise existing in the labels or that existing in the images.
This work provides a first, unified framework for reliable learning under the joint (image, label)-noise.
arXiv Detail & Related papers (2023-02-10T07:50:34Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Representation Learning for the Automatic Indexing of Sound Effects
Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size.
Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z) - Self-attention fusion for audiovisual emotion recognition with
incomplete data [103.70855797025689]
We consider the problem of multimodal data analysis with a use case of audiovisual emotion recognition.
We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality fusion mechanisms.
arXiv Detail & Related papers (2022-01-26T18:04:29Z) - Uncertainty-Aware Multi-View Representation Learning [53.06828186507994]
We devise a novel unsupervised multi-view learning approach, termed as Dynamic Uncertainty-Aware Networks (DUA-Nets)
Guided by the uncertainty of data estimated from the generation perspective, intrinsic information from multiple views is integrated to obtain noise-free representations.
Our model achieves superior performance in extensive experiments and shows the robustness to noisy data.
arXiv Detail & Related papers (2022-01-15T07:16:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.