COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for
Uncertainty-Aware Multimodal Emotion Recognition
- URL: http://arxiv.org/abs/2206.05833v2
- Date: Mon, 16 Oct 2023 20:29:02 GMT
- Title: COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for
Uncertainty-Aware Multimodal Emotion Recognition
- Authors: Mani Kumar Tellamekala, Shahin Amiriparian, Bj\"orn W. Schuller,
Elisabeth Andr\'e, Timo Giesbrecht, Michel Valstar
- Abstract summary: This paper introduces an uncertainty-aware audiovisual fusion approach that quantifies modality-wise uncertainty towards emotion prediction.
We impose Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions.
Our evaluation on two emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual emotion recognition can considerably benefit from well-calibrated and well-ranked latent uncertainty measures.
- Score: 14.963637194500029
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Automatically recognising apparent emotions from face and voice is hard, in
part because of various sources of uncertainty, including in the input data and
the labels used in a machine learning framework. This paper introduces an
uncertainty-aware audiovisual fusion approach that quantifies modality-wise
uncertainty towards emotion prediction. To this end, we propose a novel fusion
framework in which we first learn latent distributions over audiovisual
temporal context vectors separately, and then constrain the variance vectors of
unimodal latent distributions so that they represent the amount of information
each modality provides w.r.t. emotion recognition. In particular, we impose
Calibration and Ordinal Ranking constraints on the variance vectors of
audiovisual latent distributions. When well-calibrated, modality-wise
uncertainty scores indicate how much their corresponding predictions may differ
from the ground truth labels. Well-ranked uncertainty scores allow the ordinal
ranking of different frames across the modalities. To jointly impose both these
constraints, we propose a softmax distributional matching loss. In both
classification and regression settings, we compare our uncertainty-aware fusion
model with standard model-agnostic fusion baselines. Our evaluation on two
emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual
emotion recognition can considerably benefit from well-calibrated and
well-ranked latent uncertainty measures.
Related papers
- Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness [106.52630978891054]
We present a taxonomy of uncertainty specific to vision-language AI systems.
We also introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error.
arXiv Detail & Related papers (2024-07-02T04:23:54Z) - Towards A Robust Group-level Emotion Recognition via Uncertainty-Aware
Learning [29.27161082428625]
Group-level emotion recognition (GER) is an inseparable part of human behavior analysis.
We propose an uncertainty-aware learning (UAL) method to extract more robust representations for GER.
We develop an image enhancement module to enhance the model's robustness against severe noise.
arXiv Detail & Related papers (2023-10-06T15:05:41Z) - ELFNet: Evidential Local-global Fusion for Stereo Matching [17.675146012208124]
We introduce the textbfEvidential textbfLocal-global textbfFusion (ELF) framework for stereo matching.
It endows both uncertainty estimation and confidence-aware fusion with trustworthy heads.
arXiv Detail & Related papers (2023-08-01T15:51:04Z) - Evaluating AI systems under uncertain ground truth: a case study in
dermatology [44.80772162289557]
We propose a metric for measuring annotation uncertainty and provide uncertainty-adjusted metrics for performance evaluation.
We present a case study applying our framework to skin condition classification from images where annotations are provided in the form of differential diagnoses.
arXiv Detail & Related papers (2023-07-05T10:33:45Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Uncertain Facial Expression Recognition via Multi-task Assisted
Correction [43.02119884581332]
We propose a novel method of multi-task assisted correction in addressing uncertain facial expression recognition called MTAC.
Specifically, a confidence estimation block and a weighted regularization module are applied to highlight solid samples and suppress uncertain samples in every batch.
Experiments on RAF-DB, AffectNet, and AffWild2 datasets demonstrate that the MTAC obtains substantial improvements over baselines when facing synthetic and real uncertainties.
arXiv Detail & Related papers (2022-12-14T10:28:08Z) - Label Uncertainty Modeling and Prediction for Speech Emotion Recognition
using t-Distributions [15.16865739526702]
We propose to model the label distribution using a Student's t-distribution.
We derive the corresponding Kullback-Leibler divergence based loss function and use it to train an estimator for the distribution of emotion labels.
Results reveal that our t-distribution based approach improves over the Gaussian approach with state-of-the-art uncertainty modeling results.
arXiv Detail & Related papers (2022-07-25T12:38:20Z) - Self-attention fusion for audiovisual emotion recognition with
incomplete data [103.70855797025689]
We consider the problem of multimodal data analysis with a use case of audiovisual emotion recognition.
We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality fusion mechanisms.
arXiv Detail & Related papers (2022-01-26T18:04:29Z) - Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty
Estimation for Facial Expression Recognition [59.52434325897716]
We propose a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives.
For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space.
For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space.
arXiv Detail & Related papers (2021-04-01T03:21:57Z) - Distribution-free uncertainty quantification for classification under
label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues.
We first argue that label shift hurts UQ, by showing degradation in coverage and calibration.
We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z) - Uncertainty-Aware Few-Shot Image Classification [118.72423376789062]
Few-shot image classification learns to recognize new categories from limited labelled data.
We propose Uncertainty-Aware Few-Shot framework for image classification.
arXiv Detail & Related papers (2020-10-09T12:26:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.