Detecting Emotion Primitives from Speech and their use in discerning
Categorical Emotions
- URL: http://arxiv.org/abs/2002.01323v1
- Date: Fri, 31 Jan 2020 03:11:24 GMT
- Title: Detecting Emotion Primitives from Speech and their use in discerning
Categorical Emotions
- Authors: Vasudha Kowtha, Vikramjit Mitra, Chris Bartels, Erik Marchi, Sue
Booker, William Caruso, Sachin Kajarekar, Devang Naik
- Abstract summary: Emotion plays an essential role in human-to-human communication, enabling us to convey feelings such as happiness, frustration, and sincerity.
This work investigated how emotion primitives can be used to detect categorical emotions such as happiness, disgust, contempt, anger, and surprise from neutral speech.
Results indicated that arousal, followed by dominance was a better detector of such emotions.
- Score: 16.886826928295203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotion plays an essential role in human-to-human communication, enabling us
to convey feelings such as happiness, frustration, and sincerity. While modern
speech technologies rely heavily on speech recognition and natural language
understanding for speech content understanding, the investigation of vocal
expression is increasingly gaining attention. Key considerations for building
robust emotion models include characterizing and improving the extent to which
a model, given its training data distribution, is able to generalize to unseen
data conditions. This work investigated a long-shot-term memory (LSTM) network
and a time convolution - LSTM (TC-LSTM) to detect primitive emotion attributes
such as valence, arousal, and dominance, from speech. It was observed that
training with multiple datasets and using robust features improved the
concordance correlation coefficient (CCC) for valence, by 30\% with respect to
the baseline system. Additionally, this work investigated how emotion
primitives can be used to detect categorical emotions such as happiness,
disgust, contempt, anger, and surprise from neutral speech, and results
indicated that arousal, followed by dominance was a better detector of such
emotions.
Related papers
- ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains [61.50113532215864]
Causal Emotion Entailment (CEE) aims to identify the causal utterances in a conversation that stimulate the emotions expressed in a target utterance.
Current works in CEE mainly focus on modeling semantic and emotional interactions in conversations.
We introduce a step-by-step reasoning method, Emotion-Cause Reasoning Chain (ECR-Chain), to infer the stimulus from the target emotional expressions in conversations.
arXiv Detail & Related papers (2024-05-17T15:45:08Z) - Exploiting Emotion-Semantic Correlations for Empathetic Response
Generation [18.284296904390143]
Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue.
Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions.
We propose a dynamical Emotion-Semantic Correlation Model (ESCM) for empathetic dialogue generation tasks.
arXiv Detail & Related papers (2024-02-27T11:50:05Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Learning Emotion Representations from Verbal and Nonverbal Communication [7.747924294389427]
We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication.
We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning.
EmotionCLIP will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains.
arXiv Detail & Related papers (2023-05-22T21:36:55Z) - EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
We propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task.
Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU) and Skip-chain Conditional Random Field (SkipCRF)
Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets.
arXiv Detail & Related papers (2023-03-20T13:58:35Z) - Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on
Data-Driven Deep Learning [70.30713251031052]
We propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.
Experiments show that the predicted emotion strength of the proposed StrengthNet is highly correlated with ground truth scores for both seen and unseen speech.
arXiv Detail & Related papers (2022-06-15T01:25:32Z) - Attention-based Region of Interest (ROI) Detection for Speech Emotion
Recognition [4.610756199751138]
We propose to use attention mechanism in deep recurrentneural networks to detection the Regions-of-Interest (ROI) thatare more emotionally salient in human emotional speech/video.
We comparethe performance of the proposed attention networks with thestate-of-the-art LSTM models on multi-class classification task ofrecognizing six basic human emotions.
arXiv Detail & Related papers (2022-03-03T22:01:48Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - Reinforcement Learning for Emotional Text-to-Speech Synthesis with
Improved Emotion Discriminability [82.39099867188547]
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years.
We propose a new interactive training paradigm for ETTS, denoted as i-ETTS.
We formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization.
arXiv Detail & Related papers (2021-04-03T13:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.