"We care": Improving Code Mixed Speech Emotion Recognition in
Customer-Care Conversations
- URL: http://arxiv.org/abs/2308.03150v1
- Date: Sun, 6 Aug 2023 15:56:12 GMT
- Title: "We care": Improving Code Mixed Speech Emotion Recognition in
Customer-Care Conversations
- Authors: N V S Abhishek, Pushpak Bhattacharyya
- Abstract summary: Speech Emotion Recognition (SER) is the task of identifying the emotion expressed in a spoken utterance.
In this paper, we show that by incorporating word-level VAD value we improve on the task of SER by 2%, for negative emotions.
Our study can be utilized to develop conversational agents which are more polite and empathetic in such situations.
- Score: 36.9886023078247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech Emotion Recognition (SER) is the task of identifying the emotion
expressed in a spoken utterance. Emotion recognition is essential in building
robust conversational agents in domains such as law, healthcare, education, and
customer support. Most of the studies published on SER use datasets created by
employing professional actors in a noise-free environment. In natural settings
such as a customer care conversation, the audio is often noisy with speakers
regularly switching between different languages as they see fit. We have worked
in collaboration with a leading unicorn in the Conversational AI sector to
develop Natural Speech Emotion Dataset (NSED). NSED is a natural code-mixed
speech emotion dataset where each utterance in a conversation is annotated with
emotion, sentiment, valence, arousal, and dominance (VAD) values. In this
paper, we show that by incorporating word-level VAD value we improve on the
task of SER by 2%, for negative emotions, over the baseline value for NSED.
High accuracy for negative emotion recognition is essential because customers
expressing negative opinions/views need to be pacified with urgency, lest
complaints and dissatisfaction snowball and get out of hand. Escalation of
negative opinions speedily is crucial for business interests. Our study then
can be utilized to develop conversational agents which are more polite and
empathetic in such situations.
Related papers
- AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues [37.96886343501444]
We present AV-EmoDialog, a dialogue system designed to exploit verbal and non-verbal information from users' audio-visual inputs to generate more responsive and empathetic interactions.
AV-EmoDialog systematically exploits the emotional cues in audio-visual dialogues; extracting speech content and emotional tones from speech, analyzing fine-grained facial expressions from visuals, and integrating these cues to generate emotionally aware responses in an end-to-end manner.
arXiv Detail & Related papers (2024-12-23T05:24:26Z) - Affective-NLI: Towards Accurate and Interpretable Personality Recognition in Conversation [30.820334868031537]
Personality Recognition in Conversation (PRC) aims to identify the personality traits of speakers through textual dialogue content.
We propose Affective Natural Language Inference (Affective-NLI) for accurate and interpretable PRC.
arXiv Detail & Related papers (2024-04-03T09:14:24Z) - Attention-based Interactive Disentangling Network for Instance-level
Emotional Voice Conversion [81.1492897350032]
Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components.
We propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion.
arXiv Detail & Related papers (2023-12-29T08:06:45Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Effect of Attention and Self-Supervised Speech Embeddings on
Non-Semantic Speech Tasks [3.570593982494095]
We look at speech emotion understanding as a perception task which is a more realistic setting.
We leverage ComParE rich dataset of multilingual speakers and multi-label regression target of 'emotion share' or perception of that emotion.
Our results show that HuBERT-Large with a self-attention-based light-weight sequence model provides 4.6% improvement over the reported baseline.
arXiv Detail & Related papers (2023-08-28T07:11:27Z) - EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in
Hindi for Emotion Recognition in Dialogues [44.79509115642278]
We create a large conversational dataset in Hindi named EmoInHindi for multi-label emotion and intensity recognition in conversations.
We prepare our dataset in a Wizard-of-Oz manner for mental health and legal counselling of crime victims.
arXiv Detail & Related papers (2022-05-27T11:23:50Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - Perspective-taking and Pragmatics for Generating Empathetic Responses
Focused on Emotion Causes [50.569762345799354]
We argue that two issues must be tackled at the same time: (i) identifying which word is the cause for the other's emotion from his or her utterance and (ii) reflecting those specific words in the response generation.
Taking inspiration from social cognition, we leverage a generative estimator to infer emotion cause words from utterances with no word-level label.
arXiv Detail & Related papers (2021-09-18T04:22:49Z) - AdCOFE: Advanced Contextual Feature Extraction in Conversations for
emotion classification [0.29360071145551075]
The proposed model of Advanced Contextual Feature Extraction (AdCOFE) addresses these issues.
Experiments on the Emotion recognition in conversations dataset show that AdCOFE is beneficial in capturing emotions in conversations.
arXiv Detail & Related papers (2021-04-09T17:58:19Z) - Disambiguating Affective Stimulus Associations for Robot Perception and
Dialogue [67.89143112645556]
We provide a NICO robot with the ability to learn the associations between a perceived auditory stimulus and an emotional expression.
NICO is able to do this for both individual subjects and specific stimuli, with the aid of an emotion-driven dialogue system.
The robot is then able to use this information to determine a subject's enjoyment of perceived auditory stimuli in a real HRI scenario.
arXiv Detail & Related papers (2021-03-05T20:55:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.