The Emotion is Not One-hot Encoding: Learning with Grayscale Label for
Emotion Recognition in Conversation
- URL: http://arxiv.org/abs/2206.07359v2
- Date: Thu, 16 Jun 2022 07:10:45 GMT
- Title: The Emotion is Not One-hot Encoding: Learning with Grayscale Label for
Emotion Recognition in Conversation
- Authors: Joosung Lee
- Abstract summary: In emotion recognition in conversation (ERC), the emotion of the current utterance is predicted by considering the previous context.
We introduce several methods for constructing grayscale labels and confirm that each method improves the emotion recognition performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In emotion recognition in conversation (ERC), the emotion of the current
utterance is predicted by considering the previous context, which can be
utilized in many natural language processing tasks. Although multiple emotions
can coexist in a given sentence, most previous approaches take the perspective
of a classification task to predict only a given label. However, it is
expensive and difficult to label the emotion of a sentence with confidence or
multi-label. In this paper, we automatically construct a grayscale label
considering the correlation between emotions and use it for learning. That is,
instead of using a given label as a one-hot encoding, we construct a grayscale
label by measuring scores for different emotions. We introduce several methods
for constructing grayscale labels and confirm that each method improves the
emotion recognition performance. Our method is simple, effective, and
universally applicable to previous systems. The experiments show a significant
improvement in the performance of baselines.
Related papers
- Multi-label Class Incremental Emotion Decoding with Augmented Emotional Semantics Learning [20.609772647273374]
We propose an augmented emotional semantics learning framework for incremental emotion decoding.
Specifically, we design an emotional relation graph module with label disambiguation to handle the past-missing partial label problem.
An emotional semantics learning module is constructed with a graph autoencoder to obtain emotion embeddings.
arXiv Detail & Related papers (2024-05-31T03:16:54Z) - Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification [37.823815777259036]
We introduce a method for categorizing emotions from text, which acknowledges and differentiates between the diversified similarities and distinctions of various emotions.
Our approach not only preserves high accuracy in emotion prediction but also significantly reduces the magnitude of errors in cases of misclassification.
arXiv Detail & Related papers (2024-04-02T10:06:30Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Speech Synthesis with Mixed Emotions [77.05097999561298]
We propose a novel formulation that measures the relative difference between the speech samples of different emotions.
We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework.
At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector.
arXiv Detail & Related papers (2022-08-11T15:45:58Z) - MAFW: A Large-scale, Multi-modal, Compound Affective Database for
Dynamic Facial Expression Recognition in the Wild [56.61912265155151]
We propose MAFW, a large-scale compound affective database with 10,045 video-audio clips in the wild.
Each clip is annotated with a compound emotional category and a couple of sentences that describe the subjects' affective behaviors in the clip.
For the compound emotion annotation, each clip is categorized into one or more of the 11 widely-used emotions, i.e., anger, disgust, fear, happiness, neutral, sadness, surprise, contempt, anxiety, helplessness, and disappointment.
arXiv Detail & Related papers (2022-08-01T13:34:33Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - SpanEmo: Casting Multi-label Emotion Classification as Span-prediction [15.41237087996244]
We propose a new model "SpanEmo" casting multi-label emotion classification as span-prediction.
We introduce a loss function focused on modelling multiple co-existing emotions in the input sentence.
Experiments performed on the SemEval2018 multi-label emotion data over three language sets demonstrate our method's effectiveness.
arXiv Detail & Related papers (2021-01-25T12:11:04Z) - Multi-Classifier Interactive Learning for Ambiguous Speech Emotion
Recognition [9.856709988128515]
We propose a novel multi-classifier interactive learning (MCIL) method to address the ambiguous speech emotions.
MCIL mimics several individuals, who have inconsistent cognitions of ambiguous emotions, and construct new ambiguous labels.
Experiments show that MCIL does not only improve each classifier's performance, but also raises their recognition consistency from moderate to substantial.
arXiv Detail & Related papers (2020-12-10T02:58:34Z) - Learning Unseen Emotions from Gestures via Semantically-Conditioned
Zero-Shot Perception with Adversarial Autoencoders [25.774235606472875]
We introduce an adversarial, autoencoder-based representation learning that correlates 3D motion-captured gesture sequence with the vectorized representation of the natural-language perceived emotion terms.
We train our method using a combination of gestures annotated with known emotion terms and gestures not annotated with any emotions.
arXiv Detail & Related papers (2020-09-18T15:59:44Z) - EmoGraph: Capturing Emotion Correlations using Graph Networks [71.53159402053392]
We propose EmoGraph that captures the dependencies among different emotions through graph networks.
EmoGraph outperforms strong baselines, especially for macro-F1.
An experiment illustrates the captured emotion correlations can also benefit a single-label classification task.
arXiv Detail & Related papers (2020-08-21T08:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.