EmoMeta: A Multimodal Dataset for Fine-grained Emotion Classification in Chinese Metaphors
- URL: http://arxiv.org/abs/2505.13483v1
- Date: Mon, 12 May 2025 10:23:39 GMT
- Title: EmoMeta: A Multimodal Dataset for Fine-grained Emotion Classification in Chinese Metaphors
- Authors: Xingyuan Lu, Yuxi Liu, Dongyu Zhang, Zhiyao Wu, Jing Ren, Feng Xia,
- Abstract summary: multimodal dataset in Chinese comprises 5,000 text-image pairs of metaphorical advertisements.<n>Each entry is meticulously annotated for metaphor occurrence, domain relations and fine-grained emotion classification.
- Score: 16.510791655003096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metaphors play a pivotal role in expressing emotions, making them crucial for emotional intelligence. The advent of multimodal data and widespread communication has led to a proliferation of multimodal metaphors, amplifying the complexity of emotion classification compared to single-mode scenarios. However, the scarcity of research on constructing multimodal metaphorical fine-grained emotion datasets hampers progress in this domain. Moreover, existing studies predominantly focus on English, overlooking potential variations in emotional nuances across languages. To address these gaps, we introduce a multimodal dataset in Chinese comprising 5,000 text-image pairs of metaphorical advertisements. Each entry is meticulously annotated for metaphor occurrence, domain relations and fine-grained emotion classification encompassing joy, love, trust, fear, sadness, disgust, anger, surprise, anticipation, and neutral. Our dataset is publicly accessible (https://github.com/DUTIR-YSQ/EmoMeta), facilitating further advancements in this burgeoning field.
Related papers
- Cultural Bias Matters: A Cross-Cultural Benchmark Dataset and Sentiment-Enriched Model for Understanding Multimodal Metaphors [26.473849906627677]
We introduce MultiMM, a dataset designed for cross-cultural studies of metaphor in Chinese and English.<n>We propose Sentiment-Enriched Metaphor Detection (SEMD), a baseline model that integrates sentiment embeddings to enhance metaphor comprehension across cultural backgrounds.
arXiv Detail & Related papers (2025-06-08T04:02:50Z) - Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data.<n>Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge.<n>We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z) - When Words Smile: Generating Diverse Emotional Facial Expressions from Text [72.19705878257204]
We introduce an end-to-end text-to-expression model that explicitly focuses on emotional dynamics.<n>Our model learns expressive facial variations in a continuous latent space and generates expressions that are diverse, fluid, and emotionally coherent.
arXiv Detail & Related papers (2024-12-03T15:39:05Z) - Generative Emotion Cause Explanation in Multimodal Conversations [23.39751445330256]
We propose a new task, textbfMultimodal textbfConversation textbfEmotion textbfCause textbfExplanation (MCECE)
It aims to generate a detailed explanation of the emotional cause to the target utterance within a multimodal conversation scenario.
A novel approach, FAME-Net, is proposed, that harnesses the power of Large Language Models (LLMs) to analyze visual data and accurately interpret the emotions conveyed through facial expressions in videos.
arXiv Detail & Related papers (2024-11-01T09:16:30Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - MELD-ST: An Emotion-aware Speech Translation Dataset [29.650945917540316]
We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs.
Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset.
Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings.
arXiv Detail & Related papers (2024-05-21T22:40:38Z) - M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database [139.08528216461502]
We propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED.
M3ED contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances.
To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese.
arXiv Detail & Related papers (2022-05-09T06:52:51Z) - On the Complementarity of Images and Text for the Expression of Emotions
in Social Media [12.616197765581864]
We develop models to automatically detect the relation between image and text, an emotion stimulus category and the emotion class.
We evaluate if these tasks require both modalities and find for the image-text relations, that text alone is sufficient for most categories.
The emotions of anger and sadness are best predicted with a multimodal model, while text alone is sufficient for disgust, joy, and surprise.
arXiv Detail & Related papers (2022-02-11T12:33:53Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.