Color-based Emotion Representation for Speech Emotion Recognition
- URL: http://arxiv.org/abs/2602.16256v1
- Date: Wed, 18 Feb 2026 08:11:49 GMT
- Title: Color-based Emotion Representation for Speech Emotion Recognition
- Authors: Ryotaro Nagase, Ryoichi Takashima, Yoichi Yamashita,
- Abstract summary: We focus on color attributes, such as hue, saturation, and value, to represent emotions as continuous and interpretable scores.<n>We annotated an emotional speech corpus with color attributes via crowdsourcing and analyzed them.<n>We built regression models for color attributes in SER using machine learning and deep learning, and explored the multitask learning of color attribute regression and emotion classification.
- Score: 5.386017591282176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech emotion recognition (SER) has traditionally relied on categorical or dimensional labels. However, this technique is limited in representing both the diversity and interpretability of emotions. To overcome this limitation, we focus on color attributes, such as hue, saturation, and value, to represent emotions as continuous and interpretable scores. We annotated an emotional speech corpus with color attributes via crowdsourcing and analyzed them. Moreover, we built regression models for color attributes in SER using machine learning and deep learning, and explored the multitask learning of color attribute regression and emotion classification. As a result, we demonstrated the relationship between color attributes and emotions in speech, and successfully developed color attribute regression models for SER. We also showed that multitask learning improved the performance of each task.
Related papers
- Semantic Differentiation in Speech Emotion Recognition: Insights from Descriptive and Expressive Speech Roles [4.516156697420418]
Speech Emotion Recognition (SER) is essential for improving human-computer interaction.<n>We distinguish between descriptive semantics, which represents the contextual content of speech, and expressive semantics, which reflects the speaker's emotional state.<n>Our findings inform SER applications in human-AI interaction and pave the way for more context-aware AI systems.
arXiv Detail & Related papers (2025-10-03T14:42:35Z) - Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation [63.94836524433559]
DICE-Talk is a framework for disentangling identity with emotion and cooperating emotions with similar characteristics.<n>We develop a disentangled emotion embedder that jointly models audio-visual emotional cues through cross-modal attention.<n>Second, we introduce a correlation-enhanced emotion conditioning module with learnable Emotion Banks.<n>Third, we design an emotion discrimination objective that enforces affective consistency during the diffusion process.
arXiv Detail & Related papers (2025-04-25T05:28:21Z) - Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations [1.0659364666674607]
This study investigates the cognitive plausibility of a pretrained multimodal model, CLIP, in recognizing emotions evoked by abstract visual art.
We employ a dataset comprising images with associated emotion labels and textual rationales of these labels provided by human annotators.
We perform linguistic analyses of rationales, zero-shot emotion classification of images and rationales, apply similarity-based prediction of emotion, and investigate color-emotion associations.
arXiv Detail & Related papers (2024-05-10T08:45:23Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect
Transfer for Speech Synthesis [13.918119853846838]
Affect is an emotional characteristic encompassing valence, arousal, and intensity, and is a crucial attribute for enabling authentic conversations.
We propose AffectEcho, an emotion translation model, that uses a Vector Quantized codebook to model emotions within a quantized space.
We demonstrate the effectiveness of our approach in controlling the emotions of generated speech while preserving identity, style, and emotional cadence unique to each speaker.
arXiv Detail & Related papers (2023-08-16T06:28:29Z) - Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on
Data-Driven Deep Learning [70.30713251031052]
We propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.
Experiments show that the predicted emotion strength of the proposed StrengthNet is highly correlated with ground truth scores for both seen and unseen speech.
arXiv Detail & Related papers (2022-06-15T01:25:32Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - SpanEmo: Casting Multi-label Emotion Classification as Span-prediction [15.41237087996244]
We propose a new model "SpanEmo" casting multi-label emotion classification as span-prediction.
We introduce a loss function focused on modelling multiple co-existing emotions in the input sentence.
Experiments performed on the SemEval2018 multi-label emotion data over three language sets demonstrate our method's effectiveness.
arXiv Detail & Related papers (2021-01-25T12:11:04Z) - Emosaic: Visualizing Affective Content of Text at Varying Granularity [0.0]
Emosaic is a tool for visualizing the emotional tone of text documents.
We capitalize on an established three-dimensional model of human emotion.
arXiv Detail & Related papers (2020-02-24T07:25:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.