A Contextualized Real-Time Multimodal Emotion Recognition for
Conversational Agents using Graph Convolutional Networks in Reinforcement
Learning
- URL: http://arxiv.org/abs/2310.18363v1
- Date: Tue, 24 Oct 2023 14:31:17 GMT
- Title: A Contextualized Real-Time Multimodal Emotion Recognition for
Conversational Agents using Graph Convolutional Networks in Reinforcement
Learning
- Authors: Fathima Abdul Rahman, Guang Lu
- Abstract summary: We present a novel paradigm for contextualized Emotion Recognition using Graph Convolutional Network with Reinforcement Learning (conER-GRL)
Conversations are partitioned into smaller groups of utterances for effective extraction of contextual information.
The system uses Gated Recurrent Units (GRU) to extract multimodal features from these groups of utterances.
- Score: 0.800062359410795
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Owing to the recent developments in Generative Artificial Intelligence
(GenAI) and Large Language Models (LLM), conversational agents are becoming
increasingly popular and accepted. They provide a human touch by interacting in
ways familiar to us and by providing support as virtual companions. Therefore,
it is important to understand the user's emotions in order to respond
considerately. Compared to the standard problem of emotion recognition,
conversational agents face an additional constraint in that recognition must be
real-time. Studies on model architectures using audio, visual, and textual
modalities have mainly focused on emotion classification using full video
sequences that do not provide online features. In this work, we present a novel
paradigm for contextualized Emotion Recognition using Graph Convolutional
Network with Reinforcement Learning (conER-GRL). Conversations are partitioned
into smaller groups of utterances for effective extraction of contextual
information. The system uses Gated Recurrent Units (GRU) to extract multimodal
features from these groups of utterances. More importantly, Graph Convolutional
Networks (GCN) and Reinforcement Learning (RL) agents are cascade trained to
capture the complex dependencies of emotion features in interactive scenarios.
Comparing the results of the conER-GRL model with other state-of-the-art models
on the benchmark dataset IEMOCAP demonstrates the advantageous capabilities of
the conER-GRL architecture in recognizing emotions in real-time from multimodal
conversational signals.
Related papers
- Hierarchical Banzhaf Interaction for General Video-Language Representation Learning [60.44337740854767]
Multimodal representation learning plays an important role in the artificial intelligence domain.
We introduce a new approach that models video-text as game players using multivariate cooperative game theory.
We extend our original structure into a flexible encoder-decoder framework, enabling the model to adapt to various downstream tasks.
arXiv Detail & Related papers (2024-12-30T14:09:15Z) - Effective Context Modeling Framework for Emotion Recognition in Conversations [2.7175580940471913]
Emotion Recognition in Conversations (ERC) facilitates a deeper understanding of the emotions conveyed by speakers in each utterance within a conversation.
Recent Graph Neural Networks (GNNs) have demonstrated their strengths in capturing data relationships.
We propose ConxGNN, a novel GNN-based framework designed to capture contextual information in conversations.
arXiv Detail & Related papers (2024-12-21T02:22:06Z) - Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining [67.87810796668981]
Information-Sensitive Cropping (ISC) and Self-Refining Dual Learning (SRDL)
Iris achieves state-of-the-art performance across multiple benchmarks with only 850K GUI annotations.
These improvements translate to significant gains in both web and OS agent downstream tasks.
arXiv Detail & Related papers (2024-12-13T18:40:10Z) - Capturing Spectral and Long-term Contextual Information for Speech
Emotion Recognition Using Deep Learning Techniques [0.0]
This research proposes an ensemble model that combines Graph Convolutional Networks (GCN) for processing textual data and the HuBERT transformer for analyzing audio signals.
By combining GCN and HuBERT, our ensemble model can leverage the strengths of both approaches.
Results indicate that the combined model can overcome the limitations of traditional methods, leading to enhanced accuracy in recognizing emotions from speech.
arXiv Detail & Related papers (2023-08-04T06:20:42Z) - EMERSK -- Explainable Multimodal Emotion Recognition with Situational
Knowledge [0.0]
We present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK)
EMERSK is a general system for human emotion recognition and explanation using visual information.
Our system can handle multiple modalities, including facial expressions, posture, and gait in a flexible and modular manner.
arXiv Detail & Related papers (2023-06-14T17:52:37Z) - EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
We propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task.
Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU) and Skip-chain Conditional Random Field (SkipCRF)
Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets.
arXiv Detail & Related papers (2023-03-20T13:58:35Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction [50.986371459817256]
We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models.
VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
arXiv Detail & Related papers (2021-12-08T09:49:28Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.