Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in
Conversation
- URL: http://arxiv.org/abs/2206.03173v1
- Date: Tue, 7 Jun 2022 10:51:47 GMT
- Title: Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in
Conversation
- Authors: Yinan Bao, Qianwen Ma, Lingwei Wei, Wei Zhou, Songlin Hu
- Abstract summary: The emotion recognition in conversation (ERC) task aims to predict the emotion label of an utterance in a conversation.
We design a novel speaker modeling scheme that explores intra- and inter-speaker dependencies jointly in a dynamic manner.
We also propose a Speaker-Guided-Decoder (SGED) framework for ERC, which fully exploits speaker information for the decoding of emotion.
- Score: 23.93696773727978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emotion recognition in conversation (ERC) task aims to predict the
emotion label of an utterance in a conversation. Since the dependencies between
speakers are complex and dynamic, which consist of intra- and inter-speaker
dependencies, the modeling of speaker-specific information is a vital role in
ERC. Although existing researchers have proposed various methods of speaker
interaction modeling, they cannot explore dynamic intra- and inter-speaker
dependencies jointly, leading to the insufficient comprehension of context and
further hindering emotion prediction. To this end, we design a novel speaker
modeling scheme that explores intra- and inter-speaker dependencies jointly in
a dynamic manner. Besides, we propose a Speaker-Guided Encoder-Decoder (SGED)
framework for ERC, which fully exploits speaker information for the decoding of
emotion. We use different existing methods as the conversational context
encoder of our framework, showing the high scalability and flexibility of the
proposed framework. Experimental results demonstrate the superiority and
effectiveness of SGED.
Related papers
- Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - Revisiting Conversation Discourse for Dialogue Disentanglement [88.3386821205896]
We propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics.
We develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context.
Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
arXiv Detail & Related papers (2023-06-06T19:17:47Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - Discourse-Aware Emotion Cause Extraction in Conversations [21.05202596080196]
Emotion Cause Extraction in Conversations (ECEC) aims to extract the utterances which contain the emotional cause in conversations.
We propose a discourse-aware model (DAM) for this task.
Results on the benchmark corpus show that DAM outperform the state-of-theart (SOTA) systems in the literature.
arXiv Detail & Related papers (2022-10-26T02:11:01Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - A Speaker-aware Parallel Hierarchical Attentive Encoder-Decoder Model
for Multi-turn Dialogue Generation [13.820298189734686]
This paper presents a novel open-domain dialogue generation model emphasizing the differentiation of speakers in multi-turn conversations.
Our empirical results show that PHAED outperforms the state-of-the-art in both automatic and human evaluations.
arXiv Detail & Related papers (2021-10-13T16:08:29Z) - A Hierarchical Transformer with Speaker Modeling for Emotion Recognition
in Conversation [12.065178204539693]
Emotion Recognition in Conversation (ERC) is a personalized and interactive emotion recognition task.
Current method models speakers' interactions by building a relation between every two speakers.
We simplify the complicated modeling to a binary version: Intra-Speaker and Inter-Speaker dependencies.
arXiv Detail & Related papers (2020-12-29T14:47:35Z) - Structured Attention for Unsupervised Dialogue Structure Induction [110.12561786644122]
We propose to incorporate structured attention layers into a Variational Recurrent Neural Network (VRNN) model with discrete latent states to learn dialogue structure in an unsupervised fashion.
Compared to a vanilla VRNN, structured attention enables a model to focus on different parts of the source sentence embeddings while enforcing a structural inductive bias.
arXiv Detail & Related papers (2020-09-17T23:07:03Z) - Filling the Gap of Utterance-aware and Speaker-aware Representation for
Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely.
We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z) - A Machine of Few Words -- Interactive Speaker Recognition with
Reinforcement Learning [35.36769027019856]
We present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR)
In this paradigm, the recognition system aims to incrementally build a representation of the speakers by requesting personalized utterances.
We show that our method achieves excellent performance while using little speech signal amounts.
arXiv Detail & Related papers (2020-08-07T12:44:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.