Unifying the Discrete and Continuous Emotion labels for Speech Emotion
Recognition
- URL: http://arxiv.org/abs/2210.16642v1
- Date: Sat, 29 Oct 2022 16:12:31 GMT
- Title: Unifying the Discrete and Continuous Emotion labels for Speech Emotion
Recognition
- Authors: Roshan Sharma, Hira Dhamyal, Bhiksha Raj and Rita Singh
- Abstract summary: In paralinguistic analysis for emotion detection from speech, emotions have been identified with discrete or dimensional (continuous-valued) labels.
We propose a model to jointly predict continuous and discrete emotional attributes.
- Score: 28.881092401807894
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Traditionally, in paralinguistic analysis for emotion detection from speech,
emotions have been identified with discrete or dimensional (continuous-valued)
labels. Accordingly, models that have been proposed for emotion detection use
one or the other of these label types. However, psychologists like Russell and
Plutchik have proposed theories and models that unite these views, maintaining
that these representations have shared and complementary information. This
paper is an attempt to validate these viewpoints computationally. To this end,
we propose a model to jointly predict continuous and discrete emotional
attributes and show how the relationship between these can be utilized to
improve the robustness and performance of emotion recognition tasks. Our
approach comprises multi-task and hierarchical multi-task learning frameworks
that jointly model the relationships between continuous-valued and discrete
emotion labels. Experimental results on two widely used datasets (IEMOCAP and
MSPPodcast) for speech-based emotion recognition show that our model results in
statistically significant improvements in performance over strong baselines
with non-unified approaches. We also demonstrate that using one type of label
(discrete or continuous-valued) for training improves recognition performance
in tasks that use the other type of label. Experimental results and reasoning
for this approach (called the mismatched training approach) are also presented.
Related papers
- Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning [47.02027575768659]
We introduce continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories.
For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach.
A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story.
arXiv Detail & Related papers (2024-06-04T12:17:16Z) - CAGE: Circumplex Affect Guided Expression Inference [9.108319009019912]
We present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect.
We propose a model for the prediction of facial expressions tailored for lightweight applications.
arXiv Detail & Related papers (2024-04-23T12:30:17Z) - Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli.
Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process.
We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z) - Estimating the Uncertainty in Emotion Class Labels with
Utterance-Specific Dirichlet Priors [24.365876333182207]
We propose a novel training loss based on per-utterance Dirichlet prior distributions for verbal emotion recognition.
An additional metric is used to evaluate the performance by detecting test utterances with high labelling uncertainty.
Experiments with the widely used IEMOCAP dataset demonstrate that the two-branch structure achieves state-of-the-art classification results.
arXiv Detail & Related papers (2022-03-08T23:30:01Z) - Contrast and Generation Make BART a Good Dialogue Emotion Recognizer [38.18867570050835]
Long-range contextual emotional relationships with speaker dependency play a crucial part in dialogue emotion recognition.
We adopt supervised contrastive learning to make different emotions mutually exclusive to identify similar emotions better.
We utilize an auxiliary response generation task to enhance the model's ability of handling context information.
arXiv Detail & Related papers (2021-12-21T13:38:00Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Label Distribution Amendment with Emotional Semantic Correlations for
Facial Expression Recognition [69.18918567657757]
We propose a new method that amends the label distribution of each facial image by leveraging correlations among expressions in the semantic space.
By comparing semantic and task class-relation graphs of each image, the confidence of its label distribution is evaluated.
Experimental results demonstrate the proposed method is more effective than compared state-of-the-art methods.
arXiv Detail & Related papers (2021-07-23T07:46:14Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - EmoGraph: Capturing Emotion Correlations using Graph Networks [71.53159402053392]
We propose EmoGraph that captures the dependencies among different emotions through graph networks.
EmoGraph outperforms strong baselines, especially for macro-F1.
An experiment illustrates the captured emotion correlations can also benefit a single-label classification task.
arXiv Detail & Related papers (2020-08-21T08:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.