Label Uncertainty Modeling and Prediction for Speech Emotion Recognition
using t-Distributions
- URL: http://arxiv.org/abs/2207.12135v1
- Date: Mon, 25 Jul 2022 12:38:20 GMT
- Title: Label Uncertainty Modeling and Prediction for Speech Emotion Recognition
using t-Distributions
- Authors: Navin Raj Prabhu, Nale Lehmann-Willenbrock and Timo Gerkmann
- Abstract summary: We propose to model the label distribution using a Student's t-distribution.
We derive the corresponding Kullback-Leibler divergence based loss function and use it to train an estimator for the distribution of emotion labels.
Results reveal that our t-distribution based approach improves over the Gaussian approach with state-of-the-art uncertainty modeling results.
- Score: 15.16865739526702
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As different people perceive others' emotional expressions differently, their
annotation in terms of arousal and valence are per se subjective. To address
this, these emotion annotations are typically collected by multiple annotators
and averaged across annotators in order to obtain labels for arousal and
valence. However, besides the average, also the uncertainty of a label is of
interest, and should also be modeled and predicted for automatic emotion
recognition. In the literature, for simplicity, label uncertainty modeling is
commonly approached with a Gaussian assumption on the collected annotations.
However, as the number of annotators is typically rather small due to resource
constraints, we argue that the Gaussian approach is a rather crude assumption.
In contrast, in this work we propose to model the label distribution using a
Student's t-distribution which allows us to account for the number of
annotations available. With this model, we derive the corresponding
Kullback-Leibler divergence based loss function and use it to train an
estimator for the distribution of emotion labels, from which the mean and
uncertainty can be inferred. Through qualitative and quantitative analysis, we
show the benefits of the t-distribution over a Gaussian distribution. We
validate our proposed method on the AVEC'16 dataset. Results reveal that our
t-distribution based approach improves over the Gaussian approach with
state-of-the-art uncertainty modeling results in speech-based emotion
recognition, along with an optimal and even faster convergence.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - End-to-End Label Uncertainty Modeling in Speech Emotion Recognition
using Bayesian Neural Networks and Label Distribution Learning [0.0]
We propose an end-to-end Bayesian neural network capable of being trained on a distribution of annotations to capture the subjectivity-based label uncertainty.
We show that the proposed t-distribution based approach achieves state-of-the-art uncertainty modeling results in speech emotion recognition.
arXiv Detail & Related papers (2022-09-30T12:55:43Z) - COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for
Uncertainty-Aware Multimodal Emotion Recognition [14.963637194500029]
This paper introduces an uncertainty-aware audiovisual fusion approach that quantifies modality-wise uncertainty towards emotion prediction.
We impose Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions.
Our evaluation on two emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual emotion recognition can considerably benefit from well-calibrated and well-ranked latent uncertainty measures.
arXiv Detail & Related papers (2022-06-12T20:25:21Z) - Label Distribution Amendment with Emotional Semantic Correlations for
Facial Expression Recognition [69.18918567657757]
We propose a new method that amends the label distribution of each facial image by leveraging correlations among expressions in the semantic space.
By comparing semantic and task class-relation graphs of each image, the confidence of its label distribution is evaluated.
Experimental results demonstrate the proposed method is more effective than compared state-of-the-art methods.
arXiv Detail & Related papers (2021-07-23T07:46:14Z) - Path Integrals for the Attribution of Model Uncertainties [0.18899300124593643]
We present a novel algorithm that relies on in-distribution curves connecting a feature vector to some counterfactual counterpart.
We validate our approach on benchmark image data sets with varying resolution, and show that it significantly simplifies interpretability.
arXiv Detail & Related papers (2021-07-19T11:07:34Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - MatchGAN: A Self-Supervised Semi-Supervised Conditional Generative
Adversarial Network [51.84251358009803]
We present a novel self-supervised learning approach for conditional generative adversarial networks (GANs) under a semi-supervised setting.
We perform augmentation by randomly sampling sensible labels from the label space of the few labelled examples available.
Our method surpasses the baseline with only 20% of the labelled examples used to train the baseline.
arXiv Detail & Related papers (2020-06-11T17:14:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.