Related papers: LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing

Related papers

Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts [3.8776851334100644]
This paper presents a multimodal approach to tackle these challenges on a well-known dataset. We propose a system that integrates four key modalities/channels using pre-trained models: RoBERTa for text, Wav2Vec2 for speech, a proposed FacialNet for facial expressions, and a CNN+Transformer architecture trained from scratch for video analysis.
arXiv Detail & Related papers (2025-03-09T23:14:19Z)
AIMA at SemEval-2024 Task 3: Simple Yet Powerful Emotion Cause Pair Analysis [0.6465251961564605]
The SemEval-2024 Task 3 presents two subtasks focusing on emotion-cause pair extraction within conversational contexts. Subtask 1 revolves around the extraction of textual emotion-cause pairs, where causes are defined and annotated as textual spans within the conversation. Subtask 2 extends the analysis to encompass multimodal cues, including language, audio, and vision, acknowledging instances where causes may not be exclusively represented in the textual data.
arXiv Detail & Related papers (2025-01-19T21:16:31Z)
Multi-Task Semantic Communication With Graph Attention-Based Feature Correlation Extraction [69.24689059980035]
This paper presents a new graph attention inter-block (GAI) module to the encoder/transmitter of a multi-task semantic communication system. We interpret the outputs of the intermediate feature extraction blocks of the encoder as the nodes of a graph to capture the correlations of the intermediate features. Experiments demonstrate that the proposed model surpasses the most competitive and publicly available models by 11.4% on the CityScapes 2Task dataset.
arXiv Detail & Related papers (2025-01-02T04:38:01Z)
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA) To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements. To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z)
Towards More Accurate Prediction of Human Empathy and Emotion in Text and Multi-turn Conversations by Combining Advanced NLP, Transformers-based Networks, and Linguistic Methodologies [0.0]
We predict the level of empathic concern and personal distress displayed in essays. Based on the WASSA 2022 Shared Task on Empathy Detection and Emotion Classification, we implement a Feed-Forward Neural Network. As part of the final stage, these approaches have been adapted to the WASSA 2023 Shared Task on Empathy Emotion and Personality Detection in Interactions.
arXiv Detail & Related papers (2024-07-26T04:01:27Z)
SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations [53.60993109543582]
SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, aims at extracting all pairs of emotions and their corresponding causes from conversations. Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE) In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.
arXiv Detail & Related papers (2024-05-19T09:59:00Z)
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task [3.489826905722736]
SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis in Conversations. This paper proposes models that tackle this task as an utterance labeling and a sequence labeling problem. In the official leaderboard for the task, our architecture was ranked 8th, achieving an F1-score of 0.1759 on the leaderboard.
arXiv Detail & Related papers (2024-04-02T16:32:49Z)
MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models [13.137392771279742]
This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations. We propose a novel Multimodal Emotion Recognition and Multimodal Emotion Cause Extraction framework that integrates text, audio, and visual modalities.
arXiv Detail & Related papers (2024-03-31T01:16:02Z)
JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models [0.9736758288065405]
This paper presents our system development for SemEval-2024 Task 3: "The Competition of Multimodal Emotion Cause Analysis in Conversations" Effectively capturing emotions in human conversations requires integrating multiple modalities such as text, audio, and video. Our proposed approach addresses these challenges by a two-step framework.
arXiv Detail & Related papers (2024-03-05T12:07:18Z)
Multimodal Subtask Graph Generation from Instructional Videos [51.96856868195961]
Real-world tasks consist of multiple inter-dependent subtasks. In this work, we aim to model the causal dependencies between such subtasks from instructional videos describing the task. We present Multimodal Subtask Graph Generation (MSG2), an approach that constructs a Subtask Graph defining the dependency between a task's subtasks relevant to a task from noisy web videos.
arXiv Detail & Related papers (2023-02-17T03:41:38Z)
Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
This paper proposes a novel hierarchical graph network (HFGCN) model that learns more informative multimodal representations. Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation. Experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets.
arXiv Detail & Related papers (2021-09-15T08:21:01Z)
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT. We first represent the input sentence and image using a unified multi-modal graph. We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z)
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers [89.00926092864368]
We present a semantics-controlled multi-modal shuffled Transformer reasoning framework for the audio-visual scene aware dialog task. We also present a novel dynamic scene graph representation learning pipeline that consists of an intra-frame reasoning layer producing-semantic graph representations for every frame. Our results demonstrate state-of-the-art performances on all evaluation metrics.
arXiv Detail & Related papers (2020-07-08T02:00:22Z)
EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's Principle [71.47160118286226]
We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images. Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition. We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods.
arXiv Detail & Related papers (2020-03-14T19:55:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.