MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models
- URL: http://arxiv.org/abs/2404.00511v3
- Date: Thu, 11 Apr 2024 05:14:35 GMT
- Title: MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models
- Authors: Zebang Cheng, Fuqiang Niu, Yuxiang Lin, Zhi-Qi Cheng, Bowen Zhang, Xiaojiang Peng,
- Abstract summary: This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations.
We propose a novel Multimodal Emotion Recognition and Multimodal Emotion Cause Extraction framework that integrates text, audio, and visual modalities.
- Score: 13.137392771279742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations. We propose a novel Multimodal Emotion Recognition and Multimodal Emotion Cause Extraction (MER-MCE) framework that integrates text, audio, and visual modalities using specialized emotion encoders. Our approach sets itself apart from top-performing teams by leveraging modality-specific features for enhanced emotion understanding and causality inference. Experimental evaluation demonstrates the advantages of our multimodal approach, with our submission achieving a competitive weighted F1 score of 0.3435, ranking third with a margin of only 0.0339 behind the 1st team and 0.0025 behind the 2nd team. Project: https://github.com/MIPS-COLT/MER-MCE.git
Related papers
- Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset [74.74686464187474]
Emotion and Intent Joint Understanding in Multimodal Conversation (MC-EIU) aims to decode the semantic information manifested in a multimodal conversational history.
MC-EIU is enabling technology for many human-computer interfaces.
We propose an MC-EIU dataset, which features 7 emotion categories, 9 intent categories, 3 modalities, i.e., textual, acoustic, and visual content, and two languages, English and Mandarin.
arXiv Detail & Related papers (2024-07-03T01:56:00Z) - Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning [55.127202990679976]
We introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories.
This dataset enables models to learn from varied scenarios and generalize to real-world applications.
We propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders.
arXiv Detail & Related papers (2024-06-17T03:01:22Z) - SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations [53.60993109543582]
SemEval-2024 Task 3, named Multimodal Emotion Cause Analysis in Conversations, aims at extracting all pairs of emotions and their corresponding causes from conversations.
Under different modality settings, it consists of two subtasks: Textual Emotion-Cause Pair Extraction in Conversations (TECPE) and Multimodal Emotion-Cause Pair Extraction in Conversations (MECPE)
In this paper, we introduce the task, dataset and evaluation settings, summarize the systems of the top teams, and discuss the findings of the participants.
arXiv Detail & Related papers (2024-05-19T09:59:00Z) - LyS at SemEval-2024 Task 3: An Early Prototype for End-to-End Multimodal Emotion Linking as Graph-Based Parsing [7.466159270333272]
This paper describes our participation in SemEval 2024 Task 3, which focused on Multimodal Emotion Cause Analysis in Conversations.
We developed an early prototype for an end-to-end system that uses graph-based methods to identify causal emotion relations in multi-party conversations.
arXiv Detail & Related papers (2024-05-10T14:03:37Z) - PetKaz at SemEval-2024 Task 3: Advancing Emotion Classification with an LLM for Emotion-Cause Pair Extraction in Conversations [4.463184061618504]
We present our submission to the SemEval-2023 Task3 "The Competition of Multimodal Emotion Cause Analysis in Conversations"
Our approach relies on combining fine-tuned GPT-3.5 for emotion classification and a BiLSTM-based neural network to detect causes.
arXiv Detail & Related papers (2024-04-08T13:25:03Z) - LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task [3.489826905722736]
SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis in Conversations.
This paper proposes models that tackle this task as an utterance labeling and a sequence labeling problem.
In the official leaderboard for the task, our architecture was ranked 8th, achieving an F1-score of 0.1759 on the leaderboard.
arXiv Detail & Related papers (2024-04-02T16:32:49Z) - JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models [0.9736758288065405]
This paper presents our system development for SemEval-2024 Task 3: "The Competition of Multimodal Emotion Cause Analysis in Conversations"
Effectively capturing emotions in human conversations requires integrating multiple modalities such as text, audio, and video.
Our proposed approach addresses these challenges by a two-step framework.
arXiv Detail & Related papers (2024-03-05T12:07:18Z) - SemEval 2024 -- Task 10: Emotion Discovery and Reasoning its Flip in
Conversation (EDiReF) [61.49972925493912]
SemEval-2024 Task 10 is a shared task centred on identifying emotions in code-mixed dialogues.
This task comprises three distinct subtasks - emotion recognition in conversation for code-mixed dialogues, emotion flip reasoning for code-mixed dialogues, and emotion flip reasoning for English dialogues.
A total of 84 participants engaged in this task, with the most adept systems attaining F1-scores of 0.70, 0.79, and 0.76 for the respective subtasks.
arXiv Detail & Related papers (2024-02-29T08:20:06Z) - MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised
Learning [90.17500229142755]
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
This paper introduces the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants.
We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.
arXiv Detail & Related papers (2023-04-18T13:23:42Z) - MSCTD: A Multimodal Sentiment Chat Translation Dataset [66.81525961469494]
We introduce a new task named Multimodal Chat Translation (MCT)
MCT aims to generate more accurate translations with the help of the associated dialogue history and visual context.
Our work can facilitate research on both multimodal chat translation and multimodal dialogue sentiment analysis.
arXiv Detail & Related papers (2022-02-28T09:40:46Z) - UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a
Multi-Task Learning Architecture for Memotion Analysis [1.2233362977312945]
We describe the system developed by our team for SemEval-2020 Task 8: Memotion Analysis.
We introduce a novel system to analyze these posts, a multimodal multi-task learning architecture that combines ALBERT for text encoding with VGG-16 for image representation.
Our approach achieves good performance on each of the three subtasks of the current competition, ranking 11th for Subtask A (0.3453 macro F1-score), 1st for Subtask B (0.5183 macro F1-score), and 3rd for Subtask C (0.3171 macro F1-score)
arXiv Detail & Related papers (2020-09-06T17:17:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.