Estimating the Uncertainty in Emotion Class Labels with
Utterance-Specific Dirichlet Priors
- URL: http://arxiv.org/abs/2203.04443v1
- Date: Tue, 8 Mar 2022 23:30:01 GMT
- Title: Estimating the Uncertainty in Emotion Class Labels with
Utterance-Specific Dirichlet Priors
- Authors: Wen Wu, Chao Zhang, Xixin Wu, Philip C. Woodland
- Abstract summary: We propose a novel training loss based on per-utterance Dirichlet prior distributions for verbal emotion recognition.
An additional metric is used to evaluate the performance by detecting test utterances with high labelling uncertainty.
Experiments with the widely used IEMOCAP dataset demonstrate that the two-branch structure achieves state-of-the-art classification results.
- Score: 24.365876333182207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotion recognition is a key attribute for artificial intelligence systems
that need to naturally interact with humans. However, the task definition is
still an open problem due to inherent ambiguity of emotions. In this paper, a
novel Bayesian training loss based on per-utterance Dirichlet prior
distributions is proposed for verbal emotion recognition, which models the
uncertainty in one-hot labels created when human annotators assign the same
utterance to different emotion classes. An additional metric is used to
evaluate the performance by detecting test utterances with high labelling
uncertainty. This removes a major limitation that emotion classification
systems only consider utterances with majority labels.Furthermore, a
frequentist approach is studied to leverage the continuous-valued "soft" labels
obtained by averaging the one-hot labels. We propose a two-branch model
structure for emotion classification on a per-utterance basis. Experiments with
the widely used IEMOCAP dataset demonstrate that the two-branch structure
achieves state-of-the-art classification results with all common IEMOCAP test
setups. Based on this, uncertainty estimation experiments were performed. The
best performance in terms of the area under the precision-recall curve when
detecting utterances with high uncertainty was achieved by interpolating the
Bayesian training loss with the Kullback-Leibler divergence training loss for
the soft labels.
Related papers
- Handling Ambiguity in Emotion: From Out-of-Domain Detection to
Distribution Estimation [45.53789836426869]
The subjective perception of emotion leads to inconsistent labels from human annotators.
This paper investigates three methods to handle ambiguous emotion.
We show that incorporating utterances without majority-agreed labels as an additional class in the classifier reduces the classification performance of the other emotion classes.
We also propose detecting utterances with ambiguous emotions as out-of-domain samples by quantifying the uncertainty in emotion classification using evidential deep learning.
arXiv Detail & Related papers (2024-02-20T09:53:38Z) - Estimating the Uncertainty in Emotion Attributes using Deep Evidential
Regression [17.26466867595571]
In automatic emotion recognition, labels assigned by different human annotators to the same utterance are often inconsistent.
This paper proposes a Bayesian approach, deep evidential emotion regression (DEER), to estimate the uncertainty in emotion attributes.
Experiments on the widely used MSP-Podcast and IEMOCAP datasets showed DEER produced state-of-the-art results.
arXiv Detail & Related papers (2023-06-11T20:07:29Z) - Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection [98.66771688028426]
We propose a Ambiguity-Resistant Semi-supervised Learning (ARSL) for one-stage detectors.
Joint-Confidence Estimation (JCE) is proposed to quantifies the classification and localization quality of pseudo labels.
ARSL effectively mitigates the ambiguities and achieves state-of-the-art SSOD performance on MS COCO and PASCAL VOC.
arXiv Detail & Related papers (2023-03-27T07:46:58Z) - Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly
Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels.
We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z) - Unifying the Discrete and Continuous Emotion labels for Speech Emotion
Recognition [28.881092401807894]
In paralinguistic analysis for emotion detection from speech, emotions have been identified with discrete or dimensional (continuous-valued) labels.
We propose a model to jointly predict continuous and discrete emotional attributes.
arXiv Detail & Related papers (2022-10-29T16:12:31Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - End-to-end label uncertainty modeling for speech emotion recognition
using Bayesian neural networks [16.708069984516964]
We introduce an end-to-end Bayesian neural network architecture to capture the inherent subjectivity in emotions.
At training, the network learns a distribution of weights to capture the inherent uncertainty related to subjective emotion annotations.
We evaluate the proposed approach on the AVEC'16 emotion recognition dataset.
arXiv Detail & Related papers (2021-10-07T09:34:28Z) - Label Distribution Amendment with Emotional Semantic Correlations for
Facial Expression Recognition [69.18918567657757]
We propose a new method that amends the label distribution of each facial image by leveraging correlations among expressions in the semantic space.
By comparing semantic and task class-relation graphs of each image, the confidence of its label distribution is evaluated.
Experimental results demonstrate the proposed method is more effective than compared state-of-the-art methods.
arXiv Detail & Related papers (2021-07-23T07:46:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.