COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for
Uncertainty-Aware Multimodal Emotion Recognition
- URL: http://arxiv.org/abs/2206.05833v2
- Date: Mon, 16 Oct 2023 20:29:02 GMT
- Title: COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for
Uncertainty-Aware Multimodal Emotion Recognition
- Authors: Mani Kumar Tellamekala, Shahin Amiriparian, Bj\"orn W. Schuller,
Elisabeth Andr\'e, Timo Giesbrecht, Michel Valstar
- Abstract summary: This paper introduces an uncertainty-aware audiovisual fusion approach that quantifies modality-wise uncertainty towards emotion prediction.
We impose Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions.
Our evaluation on two emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual emotion recognition can considerably benefit from well-calibrated and well-ranked latent uncertainty measures.
- Score: 14.963637194500029
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Automatically recognising apparent emotions from face and voice is hard, in
part because of various sources of uncertainty, including in the input data and
the labels used in a machine learning framework. This paper introduces an
uncertainty-aware audiovisual fusion approach that quantifies modality-wise
uncertainty towards emotion prediction. To this end, we propose a novel fusion
framework in which we first learn latent distributions over audiovisual
temporal context vectors separately, and then constrain the variance vectors of
unimodal latent distributions so that they represent the amount of information
each modality provides w.r.t. emotion recognition. In particular, we impose
Calibration and Ordinal Ranking constraints on the variance vectors of
audiovisual latent distributions. When well-calibrated, modality-wise
uncertainty scores indicate how much their corresponding predictions may differ
from the ground truth labels. Well-ranked uncertainty scores allow the ordinal
ranking of different frames across the modalities. To jointly impose both these
constraints, we propose a softmax distributional matching loss. In both
classification and regression settings, we compare our uncertainty-aware fusion
model with standard model-agnostic fusion baselines. Our evaluation on two
emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual
emotion recognition can considerably benefit from well-calibrated and
well-ranked latent uncertainty measures.
Related papers
- Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition [7.25361375272096]
Multimodal multi-label emotion recognition aims to identify the concurrent presence of multiple emotions in multimodal data.
Existing studies overlook the impact of textbfaleatoric uncertainty, which is the inherent noise in the multimodal data.
This paper proposes Latent emotional Distribution Decomposition with Uncertainty perception framework.
arXiv Detail & Related papers (2025-02-19T18:53:23Z) - Uncertainty Quantification in Stereo Matching [61.73532883992135]
We propose a new framework for stereo matching and its uncertainty quantification.
We adopt Bayes risk as a measure of uncertainty and estimate data and model uncertainty separately.
We apply our uncertainty method to improve prediction accuracy by selecting data points with small uncertainties.
arXiv Detail & Related papers (2024-12-24T23:28:20Z) - Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness [106.52630978891054]
We present a taxonomy of uncertainty specific to vision-language AI systems.
We also introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error.
arXiv Detail & Related papers (2024-07-02T04:23:54Z) - Towards A Robust Group-level Emotion Recognition via Uncertainty-Aware
Learning [29.27161082428625]
Group-level emotion recognition (GER) is an inseparable part of human behavior analysis.
We propose an uncertainty-aware learning (UAL) method to extract more robust representations for GER.
We develop an image enhancement module to enhance the model's robustness against severe noise.
arXiv Detail & Related papers (2023-10-06T15:05:41Z) - ELFNet: Evidential Local-global Fusion for Stereo Matching [17.675146012208124]
We introduce the textbfEvidential textbfLocal-global textbfFusion (ELF) framework for stereo matching.
It endows both uncertainty estimation and confidence-aware fusion with trustworthy heads.
arXiv Detail & Related papers (2023-08-01T15:51:04Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Uncertain Facial Expression Recognition via Multi-task Assisted
Correction [43.02119884581332]
We propose a novel method of multi-task assisted correction in addressing uncertain facial expression recognition called MTAC.
Specifically, a confidence estimation block and a weighted regularization module are applied to highlight solid samples and suppress uncertain samples in every batch.
Experiments on RAF-DB, AffectNet, and AffWild2 datasets demonstrate that the MTAC obtains substantial improvements over baselines when facing synthetic and real uncertainties.
arXiv Detail & Related papers (2022-12-14T10:28:08Z) - Label Uncertainty Modeling and Prediction for Speech Emotion Recognition
using t-Distributions [15.16865739526702]
We propose to model the label distribution using a Student's t-distribution.
We derive the corresponding Kullback-Leibler divergence based loss function and use it to train an estimator for the distribution of emotion labels.
Results reveal that our t-distribution based approach improves over the Gaussian approach with state-of-the-art uncertainty modeling results.
arXiv Detail & Related papers (2022-07-25T12:38:20Z) - Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty
Estimation for Facial Expression Recognition [59.52434325897716]
We propose a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives.
For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space.
For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space.
arXiv Detail & Related papers (2021-04-01T03:21:57Z) - Distribution-free uncertainty quantification for classification under
label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues.
We first argue that label shift hurts UQ, by showing degradation in coverage and calibration.
We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z) - Uncertainty-Aware Few-Shot Image Classification [118.72423376789062]
Few-shot image classification learns to recognize new categories from limited labelled data.
We propose Uncertainty-Aware Few-Shot framework for image classification.
arXiv Detail & Related papers (2020-10-09T12:26:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.