Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection
and Sentiment Analysis in Conversation
- URL: http://arxiv.org/abs/2002.08267v3
- Date: Mon, 22 Jun 2020 20:03:42 GMT
- Title: Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection
and Sentiment Analysis in Conversation
- Authors: Aman Shenoy and Ashish Sardana
- Abstract summary: Multi-modal Emotion Detection and Sentiment Analysis can be particularly useful.
Current systems dealing with Multi-modal functionality fail to leverage and capture the context of the conversation.
We propose an end to end RNN architecture that attempts to take into account all the mentioned drawbacks.
- Score: 2.588973722689844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentiment Analysis and Emotion Detection in conversation is key in several
real-world applications, with an increase in modalities available aiding a
better understanding of the underlying emotions. Multi-modal Emotion Detection
and Sentiment Analysis can be particularly useful, as applications will be able
to use specific subsets of available modalities, as per the available data.
Current systems dealing with Multi-modal functionality fail to leverage and
capture - the context of the conversation through all modalities, the
dependency between the listener(s) and speaker emotional states, and the
relevance and relationship between the available modalities. In this paper, we
propose an end to end RNN architecture that attempts to take into account all
the mentioned drawbacks. Our proposed model, at the time of writing,
out-performs the state of the art on a benchmark dataset on a variety of
accuracy and regression metrics.
Related papers
- PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Efficient Long-distance Latent Relation-aware Graph Neural Network for Multi-modal Emotion Recognition in Conversations [8.107561045241445]
We propose an Efficient Long-distance Latent Relation-aware Graph Neural Network (ELR-GNN) for multi-modal emotion recognition in conversations.
ELR-GNN achieves state-of-the-art performance on the benchmark IEMOCAP and MELD, with running times reduced by 52% and 35%, respectively.
arXiv Detail & Related papers (2024-06-27T15:54:12Z) - Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey [66.166184609616]
ChatGPT has opened up immense potential for applying large language models (LLMs) to text-centric multimodal tasks.
It is still unclear how existing LLMs can adapt better to text-centric multimodal sentiment analysis tasks.
arXiv Detail & Related papers (2024-06-12T10:36:27Z) - AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features.
Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the Acc-7 and w-F1 metrics.
arXiv Detail & Related papers (2024-04-12T11:31:18Z) - Long Dialog Summarization: An Analysis [28.223798877781054]
This work emphasizes the significance of creating coherent and contextually rich summaries for effective communication in various applications.
We explore current state-of-the-art approaches for long dialog summarization in different domains and benchmark metrics based evaluations show that one single model does not perform well across various areas for distinct summarization tasks.
arXiv Detail & Related papers (2024-02-26T19:35:45Z) - Capturing Spectral and Long-term Contextual Information for Speech
Emotion Recognition Using Deep Learning Techniques [0.0]
This research proposes an ensemble model that combines Graph Convolutional Networks (GCN) for processing textual data and the HuBERT transformer for analyzing audio signals.
By combining GCN and HuBERT, our ensemble model can leverage the strengths of both approaches.
Results indicate that the combined model can overcome the limitations of traditional methods, leading to enhanced accuracy in recognizing emotions from speech.
arXiv Detail & Related papers (2023-08-04T06:20:42Z) - 'What are you referring to?' Evaluating the Ability of Multi-Modal
Dialogue Models to Process Clarificational Exchanges [65.03196674816772]
Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee.
Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarification Exchanges (CE): a Clarification Request (CR) and a response.
Here, we argue that the ability to generate and respond to CRs imposes specific constraints on the architecture and objective functions of multi-modal, visually grounded dialogue models.
arXiv Detail & Related papers (2023-07-28T13:44:33Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - M2FNet: Multi-modal Fusion Network for Emotion Recognition in
Conversation [1.3864478040954673]
We propose a Multi-modal Fusion Network (M2FNet) that extracts emotion-relevant features from visual, audio, and text modality.
It employs a multi-head attention-based fusion mechanism to combine emotion-rich latent representations of the input data.
The proposed feature extractor is trained with a novel adaptive margin-based triplet loss function to learn emotion-relevant features from the audio and visual data.
arXiv Detail & Related papers (2022-06-05T14:18:58Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Sequential Late Fusion Technique for Multi-modal Sentiment Analysis [0.0]
We use text, audio and visual modalities from MOSI dataset.
We propose a novel fusion technique using a multi-head attention LSTM network.
arXiv Detail & Related papers (2021-06-22T01:32:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.