LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing
- URL: http://arxiv.org/abs/2405.06483v1
- Date: Fri, 10 May 2024 14:03:37 GMT
- Title: LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing
- Authors: Ana Ezquerro, David Vilares,
- Abstract summary: This paper describes our participation in SemEval 2024 Task 3, which focused on Multimodal Emotion Cause Analysis in Conversations.
We developed an early prototype for an end-to-end system that uses graph-based methods to identify causal emotion relations in multi-party conversations.
- Score: 7.466159270333272
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes our participation in SemEval 2024 Task 3, which focused on Multimodal Emotion Cause Analysis in Conversations. We developed an early prototype for an end-to-end system that uses graph-based methods from dependency parsing to identify causal emotion relations in multi-party conversations. Our model comprises a neural transformer-based encoder for contextualizing multimodal conversation data and a graph-based decoder for generating the adjacency matrix scores of the causal graph. We ranked 7th out of 15 valid and official submissions for Subtask 1, using textual inputs only. We also discuss our participation in Subtask 2 during post-evaluation using multi-modal inputs.
Related papers
- PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Towards More Accurate Prediction of Human Empathy and Emotion in Text and Multi-turn Conversations by Combining Advanced NLP, Transformers-based Networks, and Linguistic Methodologies [0.0]
We predict the level of empathic concern and personal distress displayed in essays.
Based on the WASSA 2022 Shared Task on Empathy Detection and Emotion Classification, we implement a Feed-Forward Neural Network.
As part of the final stage, these approaches have been adapted to the WASSA 2023 Shared Task on Empathy Emotion and Personality Detection in Interactions.
arXiv Detail & Related papers (2024-07-26T04:01:27Z) - SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations [53.60993109543582]
SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, aims at extracting all pairs of emotions and their corresponding causes from conversations.
Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE)
In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.
arXiv Detail & Related papers (2024-05-19T09:59:00Z) - LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task [3.489826905722736]
SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis in Conversations.
This paper proposes models that tackle this task as an utterance labeling and a sequence labeling problem.
In the official leaderboard for the task, our architecture was ranked 8th, achieving an F1-score of 0.1759 on the leaderboard.
arXiv Detail & Related papers (2024-04-02T16:32:49Z) - MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models [13.137392771279742]
This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations.
We propose a novel Multimodal Emotion Recognition and Multimodal Emotion Cause Extraction framework that integrates text, audio, and visual modalities.
arXiv Detail & Related papers (2024-03-31T01:16:02Z) - JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models [0.9736758288065405]
This paper presents our system development for SemEval-2024 Task 3: "The Competition of Multimodal Emotion Cause Analysis in Conversations"
Effectively capturing emotions in human conversations requires integrating multiple modalities such as text, audio, and video.
Our proposed approach addresses these challenges by a two-step framework.
arXiv Detail & Related papers (2024-03-05T12:07:18Z) - Multimodal Subtask Graph Generation from Instructional Videos [51.96856868195961]
Real-world tasks consist of multiple inter-dependent subtasks.
In this work, we aim to model the causal dependencies between such subtasks from instructional videos describing the task.
We present Multimodal Subtask Graph Generation (MSG2), an approach that constructs a Subtask Graph defining the dependency between a task's subtasks relevant to a task from noisy web videos.
arXiv Detail & Related papers (2023-02-17T03:41:38Z) - Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
This paper proposes a novel hierarchical graph network (HFGCN) model that learns more informative multimodal representations.
Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation.
Experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets.
arXiv Detail & Related papers (2021-09-15T08:21:01Z) - A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine
Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT.
We first represent the input sentence and image using a unified multi-modal graph.
We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z) - Dynamic Graph Representation Learning for Video Dialog via Multi-Modal
Shuffled Transformers [89.00926092864368]
We present a semantics-controlled multi-modal shuffled Transformer reasoning framework for the audio-visual scene aware dialog task.
We also present a novel dynamic scene graph representation learning pipeline that consists of an intra-frame reasoning layer producing-semantic graph representations for every frame.
Our results demonstrate state-of-the-art performances on all evaluation metrics.
arXiv Detail & Related papers (2020-07-08T02:00:22Z) - EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's
Principle [71.47160118286226]
We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images.
Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition.
We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods.
arXiv Detail & Related papers (2020-03-14T19:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.