End-to-End Label Uncertainty Modeling in Speech Emotion Recognition
using Bayesian Neural Networks and Label Distribution Learning
- URL: http://arxiv.org/abs/2209.15449v2
- Date: Tue, 13 Jun 2023 08:55:11 GMT
- Title: End-to-End Label Uncertainty Modeling in Speech Emotion Recognition
using Bayesian Neural Networks and Label Distribution Learning
- Authors: Navin Raj Prabhu, Nale Lehmann-Willenbrock and Timo Gerkman
- Abstract summary: We propose an end-to-end Bayesian neural network capable of being trained on a distribution of annotations to capture the subjectivity-based label uncertainty.
We show that the proposed t-distribution based approach achieves state-of-the-art uncertainty modeling results in speech emotion recognition.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: To train machine learning algorithms to predict emotional expressions in
terms of arousal and valence, annotated datasets are needed. However, as
different people perceive others' emotional expressions differently, their
annotations are subjective. To account for this, annotations are typically
collected from multiple annotators and averaged to obtain ground-truth labels.
However, when exclusively trained on this averaged ground-truth, the model is
agnostic to the inherent subjectivity in emotional expressions. In this work,
we therefore propose an end-to-end Bayesian neural network capable of being
trained on a distribution of annotations to also capture the subjectivity-based
label uncertainty. Instead of a Gaussian, we model the annotation distribution
using Student's t-distribution, which also accounts for the number of
annotations available. We derive the corresponding Kullback-Leibler divergence
loss and use it to train an estimator for the annotation distribution, from
which the mean and uncertainty can be inferred. We validate the proposed method
using two in-the-wild datasets. We show that the proposed t-distribution based
approach achieves state-of-the-art uncertainty modeling results in speech
emotion recognition, and also consistent results in cross-corpora evaluations.
Furthermore, analyses reveal that the advantage of a t-distribution over a
Gaussian grows with increasing inter-annotator correlation and a decreasing
number of annotations available.
Related papers
- Semi-supervised Learning For Robust Speech Evaluation [30.593420641501968]
Speech evaluation measures a learners oral proficiency using automatic models.
This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization.
An anchor model is trained using pseudo labels to predict the correctness of pronunciation.
arXiv Detail & Related papers (2024-09-23T02:11:24Z) - Regressor-Segmenter Mutual Prompt Learning for Crowd Counting [70.49246560246736]
We propose mutual prompt learning (mPrompt) to solve bias and inaccuracy caused by annotation variance.
Experiments show that mPrompt significantly reduces the Mean Average Error (MAE)
arXiv Detail & Related papers (2023-12-04T07:53:59Z) - Multi-View Knowledge Distillation from Crowd Annotations for
Out-of-Domain Generalization [53.24606510691877]
We propose new methods for acquiring soft-labels from crowd-annotations by aggregating the distributions produced by existing methods.
We demonstrate that these aggregation methods lead to the most consistent performance across four NLP tasks on out-of-domain test sets.
arXiv Detail & Related papers (2022-12-19T12:40:18Z) - Label Uncertainty Modeling and Prediction for Speech Emotion Recognition
using t-Distributions [15.16865739526702]
We propose to model the label distribution using a Student's t-distribution.
We derive the corresponding Kullback-Leibler divergence based loss function and use it to train an estimator for the distribution of emotion labels.
Results reveal that our t-distribution based approach improves over the Gaussian approach with state-of-the-art uncertainty modeling results.
arXiv Detail & Related papers (2022-07-25T12:38:20Z) - Deconfounding to Explanation Evaluation in Graph Neural Networks [136.73451468551656]
We argue that a distribution shift exists between the full graph and the subgraph, causing the out-of-distribution problem.
We propose Deconfounded Subgraph Evaluation (DSE) which assesses the causal effect of an explanatory subgraph on the model prediction.
arXiv Detail & Related papers (2022-01-21T18:05:00Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z) - End-to-end label uncertainty modeling for speech emotion recognition
using Bayesian neural networks [16.708069984516964]
We introduce an end-to-end Bayesian neural network architecture to capture the inherent subjectivity in emotions.
At training, the network learns a distribution of weights to capture the inherent uncertainty related to subjective emotion annotations.
We evaluate the proposed approach on the AVEC'16 emotion recognition dataset.
arXiv Detail & Related papers (2021-10-07T09:34:28Z) - Learning from Crowds with Sparse and Imbalanced Annotations [29.596070201105274]
crowdsourcing has established itself as an efficient labeling solution through resorting to non-expert crowds.
One common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the it sparse annotation phenomenon.
We propose one self-training based approach named it Self-Crowd by progressively adding confident pseudo-annotations and rebalancing the annotation distribution.
arXiv Detail & Related papers (2021-07-11T13:06:20Z) - Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty
Estimation for Facial Expression Recognition [59.52434325897716]
We propose a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives.
For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space.
For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space.
arXiv Detail & Related papers (2021-04-01T03:21:57Z) - Toward Scalable and Unified Example-based Explanation and Outlier
Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction.
We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z) - Calibrated Adversarial Refinement for Stochastic Semantic Segmentation [5.849736173068868]
We present a strategy for learning a calibrated predictive distribution over semantic maps, where the probability associated with each prediction reflects its ground truth correctness likelihood.
We demonstrate the versatility and robustness of the approach by achieving state-of-the-art results on the multigrader LIDC dataset and on a modified Cityscapes dataset with injected ambiguities.
We show that the core design can be adapted to other tasks requiring learning a calibrated predictive distribution by experimenting on a toy regression dataset.
arXiv Detail & Related papers (2020-06-23T16:39:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.