A Unified Approach to Emotion Detection and Task-Oriented Dialogue Modeling
- URL: http://arxiv.org/abs/2401.13789v3
- Date: Fri, 28 Jun 2024 10:23:29 GMT
- Title: A Unified Approach to Emotion Detection and Task-Oriented Dialogue Modeling
- Authors: Armand Stricker, Patrick Paroubek,
- Abstract summary: User emotion detection is often overlooked in text-based task-oriented dialogue systems.
We show that seamlessly unifying ED and TOD modeling brings about mutual benefits.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In current text-based task-oriented dialogue (TOD) systems, user emotion detection (ED) is often overlooked or is typically treated as a separate and independent task, requiring additional training. In contrast, our work demonstrates that seamlessly unifying ED and TOD modeling brings about mutual benefits, and is therefore an alternative to be considered. Our method consists in augmenting SimpleToD, an end-to-end TOD system, by extending belief state tracking to include ED, relying on a single language model. We evaluate our approach using GPT-2 and Llama-2 on the EmoWOZ benchmark, a version of MultiWOZ annotated with emotions. Our results reveal a general increase in performance for ED and task results. Our findings also indicate that user emotions provide useful contextual conditioning for system responses, and can be leveraged to further refine responses in terms of empathy.
Related papers
- Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce [34.25698222058424]
We propose a novel task unifying Emotion detection and Opinion Trigger extraction.<n>EOT-X is a human-annotated collection of 2,400 reviews with fine-grained emotions and opinion triggers.<n>We present EOT-DETECT, a structured prompting framework with systematic reasoning and self-reflection.
arXiv Detail & Related papers (2025-07-07T06:59:37Z) - Empaths at SemEval-2025 Task 11: Retrieval-Augmented Approach to Perceived Emotions Prediction [83.88591755871734]
EmoRAG is a system designed to detect perceived emotions in text for SemEval-2025 Task 11, Subtask A: Multi-label Emotion Detection.<n>We focus on predicting the perceived emotions of the speaker from a given text snippet, labeling it with emotions such as joy, sadness, fear, anger, surprise, and disgust.
arXiv Detail & Related papers (2025-06-04T19:41:24Z) - VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection [50.57849622045192]
We propose VAEmo, an efficient framework for emotion-centric joint VA representation learning with external knowledge injection.<n>VAEmo achieves state-of-the-art performance with a compact design, highlighting the benefit of unified cross-modal encoding and emotion-aware semantic guidance.
arXiv Detail & Related papers (2025-05-05T03:00:51Z) - ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems [57.806797579986075]
We introduce an open-source, user-friendly toolkit to build unified web interfaces for various cascaded and E2E spoken dialogue systems.
Using the evaluation metrics, we compare various cascaded and E2E spoken dialogue systems with a human-human conversation dataset as a proxy.
Our analysis demonstrates that the toolkit allows researchers to effortlessly compare and contrast different technologies.
arXiv Detail & Related papers (2025-03-11T15:24:02Z) - Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors [63.194053817609024]
We introduce eye behaviors as an important emotional cues for the creation of a new Eye-behavior-aided Multimodal Emotion Recognition dataset.
For the first time, we provide annotations for both Emotion Recognition (ER) and Facial Expression Recognition (FER) in the EMER dataset.
We specifically design a new EMERT architecture to concurrently enhance performance in both ER and FER.
arXiv Detail & Related papers (2024-11-08T04:53:55Z) - NUS-Emo at SemEval-2024 Task 3: Instruction-Tuning LLM for Multimodal Emotion-Cause Analysis in Conversations [31.528160823423082]
This paper describes the architecture of our system developed for Task 3 of SemEval-2024: Multimodal Emotion-Cause Analysis in Conversations.
Our project targets the challenges of subtask 2, dedicated to Multimodal Emotion-Cause Pair Extraction with Emotion Category (MECPE-Cat)
Our method enables us to adeptly navigate the complexities of MECPE-Cat, achieving a weighted average 34.71% F1 score of the task, and securing the 2nd rank on the leaderboard.
arXiv Detail & Related papers (2024-08-22T08:34:39Z) - Think out Loud: Emotion Deducing Explanation in Dialogues [57.90554323226896]
We propose a new task "Emotion Deducing Explanation in Dialogues" (EDEN)
EDEN recognizes emotion and causes in an explicitly thinking way.
It can help Large Language Models (LLMs) achieve better recognition of emotions and causes.
arXiv Detail & Related papers (2024-06-07T08:58:29Z) - Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Dynamic Causal Disentanglement Model for Dialogue Emotion Detection [77.96255121683011]
We propose a Dynamic Causal Disentanglement Model based on hidden variable separation.
This model effectively decomposes the content of dialogues and investigates the temporal accumulation of emotions.
Specifically, we propose a dynamic temporal disentanglement model to infer the propagation of utterances and hidden variables.
arXiv Detail & Related papers (2023-09-13T12:58:09Z) - From Chatter to Matter: Addressing Critical Steps of Emotion Recognition
Learning in Task-oriented Dialogue [6.918298428336528]
We propose a framework that turns a chit-chat ERC model into a task-oriented one.
We use dialogue states as auxiliary features to incorporate key information from the goal of the user.
Our framework yields significant improvements for a range of chit-chat ERC models on EmoWOZ.
arXiv Detail & Related papers (2023-08-24T08:46:30Z) - ECQED: Emotion-Cause Quadruple Extraction in Dialogs [37.66816413841564]
We present Emotion-Cause Quadruple Extraction in Dialogs (ECQED), which requires detecting emotion-cause utterance pairs and emotion and cause types.
We show that introducing the fine-grained emotion and cause features evidently helps better dialog generation.
arXiv Detail & Related papers (2023-06-06T19:04:30Z) - Disentangled Variational Autoencoder for Emotion Recognition in
Conversations [14.92924920489251]
We propose a VAD-disentangled Variational AutoEncoder (VAD-VAE) for Emotion Recognition in Conversations (ERC)
VAD-VAE disentangles three affect representations Valence-Arousal-Dominance (VAD) from the latent space.
Experiments show that VAD-VAE outperforms the state-of-the-art model on two datasets.
arXiv Detail & Related papers (2023-05-23T13:50:06Z) - EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
We propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task.
Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU) and Skip-chain Conditional Random Field (SkipCRF)
Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets.
arXiv Detail & Related papers (2023-03-20T13:58:35Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - End-to-end Emotion-Cause Pair Extraction via Learning to Link [18.741585103275334]
Emotion-cause pair extraction (ECPE) aims at jointly investigating emotions and their underlying causes in documents.
Existing approaches to ECPE generally adopt a two-stage method, i.e., (1) emotion and cause detection, and then (2) pairing the detected emotions and causes.
We propose a multi-task learning model that can extract emotions, causes and emotion-cause pairs simultaneously in an end-to-end manner.
arXiv Detail & Related papers (2020-02-25T07:49:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.