End-to-end label uncertainty modeling for speech emotion recognition
using Bayesian neural networks
- URL: http://arxiv.org/abs/2110.03299v1
- Date: Thu, 7 Oct 2021 09:34:28 GMT
- Title: End-to-end label uncertainty modeling for speech emotion recognition
using Bayesian neural networks
- Authors: Navin Raj Prabhu, Guillaume Carbajal, Nale Lehmann-Willenbrock and
Timo Gerkmann
- Abstract summary: We introduce an end-to-end Bayesian neural network architecture to capture the inherent subjectivity in emotions.
At training, the network learns a distribution of weights to capture the inherent uncertainty related to subjective emotion annotations.
We evaluate the proposed approach on the AVEC'16 emotion recognition dataset.
- Score: 16.708069984516964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotions are subjective constructs. Recent end-to-end speech emotion
recognition systems are typically agnostic to the subjective nature of
emotions, despite their state-of-the-art performances. In this work, we
introduce an end-to-end Bayesian neural network architecture to capture the
inherent subjectivity in emotions. To the best of our knowledge, this work is
the first to use Bayesian neural networks for speech emotion recognition. At
training, the network learns a distribution of weights to capture the inherent
uncertainty related to subjective emotion annotations. For this, we introduce a
loss term which enables the model to be explicitly trained on a distribution of
emotion annotations, rather than training them exclusively on mean or
gold-standard labels. We evaluate the proposed approach on the AVEC'16 emotion
recognition dataset. Qualitative and quantitative analysis of the results
reveal that the proposed model can aptly capture the distribution of subjective
emotion annotations with a compromise between mean and standard deviation
estimations.
Related papers
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Unifying the Discrete and Continuous Emotion labels for Speech Emotion
Recognition [28.881092401807894]
In paralinguistic analysis for emotion detection from speech, emotions have been identified with discrete or dimensional (continuous-valued) labels.
We propose a model to jointly predict continuous and discrete emotional attributes.
arXiv Detail & Related papers (2022-10-29T16:12:31Z) - End-to-End Label Uncertainty Modeling in Speech Emotion Recognition
using Bayesian Neural Networks and Label Distribution Learning [0.0]
We propose an end-to-end Bayesian neural network capable of being trained on a distribution of annotations to capture the subjectivity-based label uncertainty.
We show that the proposed t-distribution based approach achieves state-of-the-art uncertainty modeling results in speech emotion recognition.
arXiv Detail & Related papers (2022-09-30T12:55:43Z) - Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli.
Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process.
We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z) - Estimating the Uncertainty in Emotion Class Labels with
Utterance-Specific Dirichlet Priors [24.365876333182207]
We propose a novel training loss based on per-utterance Dirichlet prior distributions for verbal emotion recognition.
An additional metric is used to evaluate the performance by detecting test utterances with high labelling uncertainty.
Experiments with the widely used IEMOCAP dataset demonstrate that the two-branch structure achieves state-of-the-art classification results.
arXiv Detail & Related papers (2022-03-08T23:30:01Z) - Interpretability for Multimodal Emotion Recognition using Concept
Activation Vectors [0.0]
We address the issue of interpretability for neural networks in the context of emotion recognition using Concept Activation Vectors (CAVs)
We define human-understandable concepts specific to Emotion AI and map them to the widely-used IEMOCAP multimodal database.
We then evaluate the influence of our proposed concepts at multiple layers of the Bi-directional Contextual LSTM (BC-LSTM) network.
arXiv Detail & Related papers (2022-02-02T15:02:42Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - Facial Expression Editing with Continuous Emotion Labels [76.36392210528105]
Deep generative models have achieved impressive results in the field of automated facial expression editing.
We propose a model that can be used to manipulate facial expressions in facial images according to continuous two-dimensional emotion labels.
arXiv Detail & Related papers (2020-06-22T13:03:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.