CoMPM: Context Modeling with Speaker's Pre-trained Memory Tracking for
Emotion Recognition in Conversation
- URL: http://arxiv.org/abs/2108.11626v1
- Date: Thu, 26 Aug 2021 07:45:09 GMT
- Title: CoMPM: Context Modeling with Speaker's Pre-trained Memory Tracking for
Emotion Recognition in Conversation
- Authors: Joosung Lee, Wooin Lee
- Abstract summary: We introduce a context embedding module (CoM) combined with a pre-trained memory module (PM)
We show that the pre-trained memory significantly improves the final accuracy of emotion recognition.
We experimented on both the multi-party datasets (MELD, EmoryNLP) and the dyadic-party datasets (IEMOCAP, DailyDialog)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the use of interactive machines grow, the task of Emotion Recognition in
Conversation (ERC) became more important. If the machine generated sentences
reflect emotion, more human-like sympathetic conversations are possible. Since
emotion recognition in conversation is inaccurate if the previous utterances
are not taken into account, many studies reflect the dialogue context to
improve the performances. We introduce CoMPM, a context embedding module (CoM)
combined with a pre-trained memory module (PM) that tracks memory of the
speaker's previous utterances within the context, and show that the pre-trained
memory significantly improves the final accuracy of emotion recognition. We
experimented on both the multi-party datasets (MELD, EmoryNLP) and the
dyadic-party datasets (IEMOCAP, DailyDialog), showing that our approach achieve
competitive performance on all datasets.
Related papers
- Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs [2.8728982844941178]
Speech Emotion Recognition (SER) focuses on identifying emotional states from spoken language.
We propose a novel approach that first refines all available transcriptions to ensure data reliability.
We then segment each complete conversation into smaller dialogues and use these dialogues as context to predict the emotion of the target utterance within the dialogue.
arXiv Detail & Related papers (2024-10-27T04:23:34Z) - Emotional Listener Portrait: Realistic Listener Motion Simulation in
Conversation [50.35367785674921]
Listener head generation centers on generating non-verbal behaviors of a listener in reference to the information delivered by a speaker.
A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation.
We propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords.
Our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate controllable responses with a predetermined attitude.
arXiv Detail & Related papers (2023-09-29T18:18:32Z) - Multiscale Contextual Learning for Speech Emotion Recognition in
Emergency Call Center Conversations [4.297070083645049]
This paper presents a multi-scale conversational context learning approach for speech emotion recognition.
We investigated this approach on both speech transcriptions and acoustic segments.
According to our tests, the context derived from previous tokens has a more significant influence on accurate prediction than the following tokens.
arXiv Detail & Related papers (2023-08-28T20:31:45Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Context-Dependent Embedding Utterance Representations for Emotion
Recognition in Conversations [1.8126187844654875]
We approach Emotion Recognition in Conversations leveraging the conversational context.
We propose context-dependent embedding representations of each utterance.
The effectiveness of our approach is validated on the open-domain DailyDialog dataset and on the task-oriented EmoWOZ dataset.
arXiv Detail & Related papers (2023-04-17T12:37:57Z) - Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs.
We employ domain-adaptive training strategies to help the model adapt to the dialogue domains.
Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Beyond Isolated Utterances: Conversational Emotion Recognition [33.52961239281893]
Speech emotion recognition is the task of recognizing the speaker's emotional state given a recording of their utterance.
We propose several approaches for conversational emotion recognition (CER) by treating it as a sequence labeling task.
We investigated transformer architecture for CER and, compared it with ResNet-34 and BiLSTM architectures in both contextual and context-less scenarios.
arXiv Detail & Related papers (2021-09-13T16:40:35Z) - Discovering Emotion and Reasoning its Flip in Multi-Party Conversations
using Masked Memory Network and Transformer [16.224961520924115]
We introduce a novel task -- Emotion Flip Reasoning (EFR)
EFR aims to identify past utterances which have triggered one's emotion state to flip at a certain time.
We propose a masked memory network to address the former and a Transformer-based network for the latter task.
arXiv Detail & Related papers (2021-03-23T07:42:09Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z) - Exploiting Unsupervised Data for Emotion Recognition in Conversations [76.01690906995286]
Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations.
The available supervised data for the ERC task is limited.
We propose a novel approach to leverage unsupervised conversation data.
arXiv Detail & Related papers (2020-10-02T13:28:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.