S+PAGE: A Speaker and Position-Aware Graph Neural Network Model for
Emotion Recognition in Conversation
- URL: http://arxiv.org/abs/2112.12389v1
- Date: Thu, 23 Dec 2021 07:25:02 GMT
- Title: S+PAGE: A Speaker and Position-Aware Graph Neural Network Model for
Emotion Recognition in Conversation
- Authors: Chen Liang, Chong Yang, Jing Xu, Juyang Huang, Yongliang Wang, Yang
Dong
- Abstract summary: Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications.
Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them.
We propose a novel Speaker and Position-Aware Graph neural network model for ERC (S+), which contains three stages to combine the benefits of both Transformer and relational graph network.
- Score: 12.379143886125926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Emotion recognition in conversation (ERC) has attracted much attention in
recent years for its necessity in widespread applications. Existing ERC methods
mostly model the self and inter-speaker context separately, posing a major
issue for lacking enough interaction between them. In this paper, we propose a
novel Speaker and Position-Aware Graph neural network model for ERC (S+PAGE),
which contains three stages to combine the benefits of both Transformer and
relational graph convolution network (R-GCN) for better contextual modeling.
Firstly, a two-stream conversational Transformer is presented to extract the
coarse self and inter-speaker contextual features for each utterance. Then, a
speaker and position-aware conversation graph is constructed, and we propose an
enhanced R-GCN model, called PAG, to refine the coarse features guided by a
relative positional encoding. Finally, both of the features from the former two
stages are input into a conditional random field layer to model the emotion
transfer.
Related papers
- Predicting Evoked Emotions in Conversations [6.0866477571088895]
We introduce the novel problem of Predicting Emotions in Conversations (PEC) for the next turn (n+1)
We systematically approach the problem by modeling three dimensions inherently connected to evoked emotions in dialogues.
We perform a comprehensive empirical evaluation of the various proposed models for addressing the PEC problem.
arXiv Detail & Related papers (2023-12-31T03:30:42Z) - LineConGraphs: Line Conversation Graphs for Effective Emotion
Recognition using Graph Neural Networks [10.446376560905863]
We propose novel line conversation graph convolutional network (LineConGCN) and graph attention (LineConGAT) models for Emotion Recognition in Conversations (ERC) analysis.
These models are speaker-independent and built using a graph construction strategy for conversations -- line conversation graphs (LineConGraphs)
We evaluate the performance of our proposed models on two benchmark datasets, IEMOCAP and MELD, and show that our LineConGAT model outperforms the state-of-the-art methods with an F1-score of 64.58% and 76.50%.
arXiv Detail & Related papers (2023-12-04T19:36:58Z) - HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion
Recognition [41.837538440839815]
We propose a hierarchical cross-attention model (HCAM) approach to multi-modal emotion recognition.
The input to the model consists of two modalities, i) audio data, processed through a learnable wav2vec approach and, ii) text data represented using a bidirectional encoder representations from transformers (BERT) model.
In order to incorporate contextual knowledge and the information across the two modalities, the audio and text embeddings are combined using a co-attention layer.
arXiv Detail & Related papers (2023-04-14T03:25:00Z) - ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for
Scene Text Spotting [121.11880210592497]
We argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.
We propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting.
arXiv Detail & Related papers (2022-11-19T03:50:33Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - An Adaptive Learning based Generative Adversarial Network for One-To-One
Voice Conversion [9.703390665821463]
We propose an adaptive learning-based GAN model called ALGAN-VC for an efficient one-to-one VC of speakers.
The model is tested on Voice Conversion Challenge (VCC) 2016, 2018, and 2020 datasets as well as on our self-prepared speech dataset.
A subjective and objective evaluation of the generated speech samples indicated that the proposed model elegantly performed the voice conversion task.
arXiv Detail & Related papers (2021-04-25T13:44:32Z) - A Hierarchical Transformer with Speaker Modeling for Emotion Recognition
in Conversation [12.065178204539693]
Emotion Recognition in Conversation (ERC) is a personalized and interactive emotion recognition task.
Current method models speakers' interactions by building a relation between every two speakers.
We simplify the complicated modeling to a binary version: Intra-Speaker and Inter-Speaker dependencies.
arXiv Detail & Related papers (2020-12-29T14:47:35Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - Filling the Gap of Utterance-aware and Speaker-aware Representation for
Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely.
We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z) - Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
Modeling [61.351967629600594]
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach.
In this approach, we combine a bottle-neck feature extractor (BNE) with a seq2seq synthesis module.
Objective and subjective evaluations show that the proposed any-to-many approach has superior voice conversion performance in terms of both naturalness and speaker similarity.
arXiv Detail & Related papers (2020-09-06T13:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.