Related papers: Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

URL: http://arxiv.org/abs/2210.16642v1
Date: Sat, 29 Oct 2022 16:12:31 GMT
Title: Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition
Authors: Roshan Sharma, Hira Dhamyal, Bhiksha Raj and Rita Singh
Abstract summary: In paralinguistic analysis for emotion detection from speech, emotions have been identified with discrete or dimensional (continuous-valued) labels. We propose a model to jointly predict continuous and discrete emotional attributes.
Score: 28.881092401807894
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Traditionally, in paralinguistic analysis for emotion detection from speech, emotions have been identified with discrete or dimensional (continuous-valued) labels. Accordingly, models that have been proposed for emotion detection use one or the other of these label types. However, psychologists like Russell and Plutchik have proposed theories and models that unite these views, maintaining that these representations have shared and complementary information. This paper is an attempt to validate these viewpoints computationally. To this end, we propose a model to jointly predict continuous and discrete emotional attributes and show how the relationship between these can be utilized to improve the robustness and performance of emotion recognition tasks. Our approach comprises multi-task and hierarchical multi-task learning frameworks that jointly model the relationships between continuous-valued and discrete emotion labels. Experimental results on two widely used datasets (IEMOCAP and MSPPodcast) for speech-based emotion recognition show that our model results in statistically significant improvements in performance over strong baselines with non-unified approaches. We also demonstrate that using one type of label (discrete or continuous-valued) for training improves recognition performance in tasks that use the other type of label. Experimental results and reasoning for this approach (called the mismatched training approach) are also presented.

Related papers

Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics. We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention. Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks. Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z)
Continuous Adversarial Text Representation Learning for Affective Recognition [1.319058156672392]
We propose a novel framework for enhancing emotion-aware embeddings in transformer-based models. Our approach introduces a continuous valence-arousal labeling system to guide contrastive learning. We employ a dynamic token perturbation mechanism, using gradient-based saliency to focus on sentiment-relevant tokens, improving model sensitivity to emotional cues.
arXiv Detail & Related papers (2025-02-28T00:29:09Z)
Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning [47.02027575768659]
We introduce continuous valence and arousal labels for an existing dataset of children's stories originally annotated with discrete emotion categories. For predicting the thus obtained emotionality signals, we fine-tune a DeBERTa model and improve upon this baseline via a weakly supervised learning approach. A detailed analysis shows the extent to which the results vary depending on factors such as the author, the individual story, or the section within the story.
arXiv Detail & Related papers (2024-06-04T12:17:16Z)
CAGE: Circumplex Affect Guided Expression Inference [9.108319009019912]
We present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. We propose a model for the prediction of facial expressions tailored for lightweight applications.
arXiv Detail & Related papers (2024-04-23T12:30:17Z)
Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli. Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process. We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z)
Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors [24.365876333182207]
We propose a novel training loss based on per-utterance Dirichlet prior distributions for verbal emotion recognition. An additional metric is used to evaluate the performance by detecting test utterances with high labelling uncertainty. Experiments with the widely used IEMOCAP dataset demonstrate that the two-branch structure achieves state-of-the-art classification results.
arXiv Detail & Related papers (2022-03-08T23:30:01Z)
Contrast and Generation Make BART a Good Dialogue Emotion Recognizer [38.18867570050835]
Long-range contextual emotional relationships with speaker dependency play a crucial part in dialogue emotion recognition. We adopt supervised contrastive learning to make different emotions mutually exclusive to identify similar emotions better. We utilize an auxiliary response generation task to enhance the model's ability of handling context information.
arXiv Detail & Related papers (2021-12-21T13:38:00Z)
MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition. Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction. Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z)
Label Distribution Amendment with Emotional Semantic Correlations for Facial Expression Recognition [69.18918567657757]
We propose a new method that amends the label distribution of each facial image by leveraging correlations among expressions in the semantic space. By comparing semantic and task class-relation graphs of each image, the confidence of its label distribution is evaluated. Experimental results demonstrate the proposed method is more effective than compared state-of-the-art methods.
arXiv Detail & Related papers (2021-07-23T07:46:14Z)
Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. Our model achieves state-of-the-art performance on most of the emotion categories. Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z)
EmoGraph: Capturing Emotion Correlations using Graph Networks [71.53159402053392]
We propose EmoGraph that captures the dependencies among different emotions through graph networks. EmoGraph outperforms strong baselines, especially for macro-F1. An experiment illustrates the captured emotion correlations can also benefit a single-label classification task.
arXiv Detail & Related papers (2020-08-21T08:59:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.