On the Complementarity of Images and Text for the Expression of Emotions
in Social Media
- URL: http://arxiv.org/abs/2202.07427v1
- Date: Fri, 11 Feb 2022 12:33:53 GMT
- Title: On the Complementarity of Images and Text for the Expression of Emotions
in Social Media
- Authors: Anna Khlyzova and Carina Silberer and Roman Klinger
- Abstract summary: We develop models to automatically detect the relation between image and text, an emotion stimulus category and the emotion class.
We evaluate if these tasks require both modalities and find for the image-text relations, that text alone is sufficient for most categories.
The emotions of anger and sadness are best predicted with a multimodal model, while text alone is sufficient for disgust, joy, and surprise.
- Score: 12.616197765581864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Authors of posts in social media communicate their emotions and what causes
them with text and images. While there is work on emotion and stimulus
detection for each modality separately, it is yet unknown if the modalities
contain complementary emotion information in social media. We aim at filling
this research gap and contribute a novel, annotated corpus of English
multimodal Reddit posts. On this resource, we develop models to automatically
detect the relation between image and text, an emotion stimulus category and
the emotion class. We evaluate if these tasks require both modalities and find
for the image-text relations, that text alone is sufficient for most categories
(complementary, illustrative, opposing): the information in the text allows to
predict if an image is required for emotion understanding. The emotions of
anger and sadness are best predicted with a multimodal model, while text alone
is sufficient for disgust, joy, and surprise. Stimuli depicted by objects,
animals, food, or a person are best predicted by image-only models, while
multimodal models are most effective on art, events, memes, places, or
screenshots.
Related papers
- EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model [23.26111054485357]
We introduce the new task of continuous emotional image content generation (C-EICG)
We present EmotiCrafter, an emotional image generation model that generates images based on text prompts and Valence-Arousal values.
arXiv Detail & Related papers (2025-01-10T04:41:37Z) - EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion
Models [11.901294654242376]
We introduce Emotional Image Content Generation (EICG), a new task to generate semantic-clear and emotion-faithful images given emotion categories.
Specifically, we propose an emotion space and construct a mapping network to align it with the powerful Contrastive Language-Image Pre-training (CLIP) space.
Our method outperforms the state-of-the-art text-to-image approaches both quantitatively and qualitatively.
arXiv Detail & Related papers (2024-01-09T15:23:21Z) - EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes [53.95428298229396]
We introduce EmoSet, the first large-scale visual emotion dataset annotated with rich attributes.
EmoSet comprises 3.3 million images in total, with 118,102 of these images carefully labeled by human annotators.
Motivated by psychological studies, in addition to emotion category, each image is also annotated with a set of describable emotion attributes.
arXiv Detail & Related papers (2023-07-16T06:42:46Z) - High-Level Context Representation for Emotion Recognition in Images [4.987022981158291]
We propose an approach for high-level context representation extraction from images.
The model relies on a single cue and a single encoding stream to correlate this representation with emotions.
Our approach is more efficient than previous models and can be easily deployed to address real-world problems related to emotion recognition.
arXiv Detail & Related papers (2023-05-05T13:20:41Z) - Affection: Learning Affective Explanations for Real-World Visual Data [50.28825017427716]
We introduce and share with the research community a large-scale dataset that contains emotional reactions and free-form textual explanations for 85,007 publicly available images.
We show that there is significant common ground to capture potentially plausible emotional responses with a large support in the subject population.
Our work paves the way for richer, more human-centric, and emotionally-aware image analysis systems.
arXiv Detail & Related papers (2022-10-04T22:44:17Z) - Speech Synthesis with Mixed Emotions [77.05097999561298]
We propose a novel formulation that measures the relative difference between the speech samples of different emotions.
We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework.
At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector.
arXiv Detail & Related papers (2022-08-11T15:45:58Z) - ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer [59.05857591535986]
We propose a model called ViNTER to generate image narratives that focus on time series representing varying emotions as "emotion arcs"
We present experimental results of both manual and automatic evaluations.
arXiv Detail & Related papers (2022-02-15T10:53:08Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Understanding of Emotion Perception from Art [39.47632069314582]
We consider the problem of understanding emotions evoked in viewers by artwork using both text and visual modalities.
Our results show that single-stream multimodal transformer-based models like MMBT and VisualBERT perform better compared to image-only models.
arXiv Detail & Related papers (2021-10-13T04:14:49Z) - Infusing Multi-Source Knowledge with Heterogeneous Graph Neural Network
for Emotional Conversation Generation [25.808037796936766]
In a real-world conversation, we instinctively perceive emotions from multi-source information.
We propose a heterogeneous graph-based model for emotional conversation generation.
Experimental results show that our model can effectively perceive emotions from multi-source knowledge.
arXiv Detail & Related papers (2020-12-09T06:09:31Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - Emosaic: Visualizing Affective Content of Text at Varying Granularity [0.0]
Emosaic is a tool for visualizing the emotional tone of text documents.
We capitalize on an established three-dimensional model of human emotion.
arXiv Detail & Related papers (2020-02-24T07:25:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.