End-to-End Continuous Speech Emotion Recognition in Real-life Customer
Service Call Center Conversations
- URL: http://arxiv.org/abs/2310.02281v1
- Date: Mon, 2 Oct 2023 11:53:48 GMT
- Title: End-to-End Continuous Speech Emotion Recognition in Real-life Customer
Service Call Center Conversations
- Authors: Yajing Feng (CNRS-LISN), Laurence Devillers (CNRS-LISN, SU)
- Abstract summary: We present our approach to constructing a large-scale reallife dataset (CusEmo) for continuous SER in customer service call center conversations.
We adopted the dimensional emotion annotation approach to capture the subtlety, complexity, and continuity of emotions in real-life call center conversations.
The study also addresses the challenges encountered during the application of the End-to-End (E2E) SER system to the dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech Emotion recognition (SER) in call center conversations has emerged as
a valuable tool for assessing the quality of interactions between clients and
agents. In contrast to controlled laboratory environments, real-life
conversations take place under uncontrolled conditions and are subject to
contextual factors that influence the expression of emotions. In this paper, we
present our approach to constructing a large-scale reallife dataset (CusEmo)
for continuous SER in customer service call center conversations. We adopted
the dimensional emotion annotation approach to capture the subtlety,
complexity, and continuity of emotions in real-life call center conversations,
while annotating contextual information. The study also addresses the
challenges encountered during the application of the End-to-End (E2E) SER
system to the dataset, including determining the appropriate label sampling
rate and input segment length, as well as integrating contextual information
(interlocutor's gender and empathy level) with different weights using
multitask learning. The result shows that incorporating the empathy level
information improved the model's performance.
Related papers
- Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems.
Previous studies suggest using only a portion of the dialogue context in the annotation process.
This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - EmoTwiCS: A Corpus for Modelling Emotion Trajectories in Dutch Customer
Service Dialogues on Twitter [9.2878798098526]
This paper introduces EmoTwiCS, a corpus of 9,489 Dutch customer service dialogues on Twitter that are annotated for emotion trajectories.
The term emotion trajectory' refers not only to the fine-grained emotions experienced by customers, but also to the event happening prior to the conversation and the responses made by the human operator.
arXiv Detail & Related papers (2023-10-10T11:31:11Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - Multiscale Contextual Learning for Speech Emotion Recognition in
Emergency Call Center Conversations [4.297070083645049]
This paper presents a multi-scale conversational context learning approach for speech emotion recognition.
We investigated this approach on both speech transcriptions and acoustic segments.
According to our tests, the context derived from previous tokens has a more significant influence on accurate prediction than the following tokens.
arXiv Detail & Related papers (2023-08-28T20:31:45Z) - Building Emotional Support Chatbots in the Era of LLMs [64.06811786616471]
We introduce an innovative methodology that synthesizes human insights with the computational prowess of Large Language Models (LLMs)
By utilizing the in-context learning potential of ChatGPT, we generate an ExTensible Emotional Support dialogue dataset, named ExTES.
Following this, we deploy advanced tuning techniques on the LLaMA model, examining the impact of diverse training strategies, ultimately yielding an LLM meticulously optimized for emotional support interactions.
arXiv Detail & Related papers (2023-08-17T10:49:18Z) - Context-Dependent Embedding Utterance Representations for Emotion
Recognition in Conversations [1.8126187844654875]
We approach Emotion Recognition in Conversations leveraging the conversational context.
We propose context-dependent embedding representations of each utterance.
The effectiveness of our approach is validated on the open-domain DailyDialog dataset and on the task-oriented EmoWOZ dataset.
arXiv Detail & Related papers (2023-04-17T12:37:57Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - AdCOFE: Advanced Contextual Feature Extraction in Conversations for
emotion classification [0.29360071145551075]
The proposed model of Advanced Contextual Feature Extraction (AdCOFE) addresses these issues.
Experiments on the Emotion recognition in conversations dataset show that AdCOFE is beneficial in capturing emotions in conversations.
arXiv Detail & Related papers (2021-04-09T17:58:19Z) - Exploiting Unsupervised Data for Emotion Recognition in Conversations [76.01690906995286]
Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations.
The available supervised data for the ERC task is limited.
We propose a novel approach to leverage unsupervised conversation data.
arXiv Detail & Related papers (2020-10-02T13:28:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.