Understanding of Emotion Perception from Art
- URL: http://arxiv.org/abs/2110.06486v1
- Date: Wed, 13 Oct 2021 04:14:49 GMT
- Title: Understanding of Emotion Perception from Art
- Authors: Digbalay Bose, Krishna Somandepalli, Souvik Kundu, Rimita Lahiri,
Jonathan Gratch and Shrikanth Narayanan
- Abstract summary: We consider the problem of understanding emotions evoked in viewers by artwork using both text and visual modalities.
Our results show that single-stream multimodal transformer-based models like MMBT and VisualBERT perform better compared to image-only models.
- Score: 39.47632069314582
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computational modeling of the emotions evoked by art in humans is a
challenging problem because of the subjective and nuanced nature of art and
affective signals. In this paper, we consider the above-mentioned problem of
understanding emotions evoked in viewers by artwork using both text and visual
modalities. Specifically, we analyze images and the accompanying text captions
from the viewers expressing emotions as a multimodal classification task. Our
results show that single-stream multimodal transformer-based models like MMBT
and VisualBERT perform better compared to both image-only models and
dual-stream multimodal models having separate pathways for text and image
modalities. We also observe improvements in performance for extreme positive
and negative emotion classes, when a single-stream model like MMBT is compared
with a text-only transformer model like BERT.
Related papers
- EmoSEM: Segment and Explain Emotion Stimuli in Visual Art [25.539022846134543]
This paper focuses on a key challenge in visual art understanding: given an art image, the model pinpoints pixel regions that trigger a specific human emotion.
Despite recent advances in art understanding, pixel-level emotion understanding still faces a dual challenge.
This paper proposes the Emotion stimuli and Explanation Model (EmoSEM) to endow the segmentation model SAM with emotion comprehension capability.
arXiv Detail & Related papers (2025-04-20T15:40:00Z) - Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data.
Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge.
We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z) - MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - Emotional Images: Assessing Emotions in Images and Potential Biases in Generative Models [0.0]
This paper examines potential biases and inconsistencies in emotional evocation of images produced by generative artificial intelligence (AI) models.
We compare the emotions evoked by an AI-produced image to the emotions evoked by prompts used to create those images.
Findings indicate that AI-generated images frequently lean toward negative emotional content, regardless of the original prompt.
arXiv Detail & Related papers (2024-11-08T21:42:50Z) - Training A Small Emotional Vision Language Model for Visual Art Comprehension [35.273057947865176]
This paper develops small vision language models to understand visual art.
It builds a small emotional vision language model (SEVLM) by emotion modeling and input-output feature alignment.
It not only outperforms the state-of-the-art small models but is also competitive compared with LLaVA 7B after fine-tuning and GPT4(V)
arXiv Detail & Related papers (2024-03-17T09:01:02Z) - High-Level Context Representation for Emotion Recognition in Images [4.987022981158291]
We propose an approach for high-level context representation extraction from images.
The model relies on a single cue and a single encoding stream to correlate this representation with emotions.
Our approach is more efficient than previous models and can be easily deployed to address real-world problems related to emotion recognition.
arXiv Detail & Related papers (2023-05-05T13:20:41Z) - On the Complementarity of Images and Text for the Expression of Emotions
in Social Media [12.616197765581864]
We develop models to automatically detect the relation between image and text, an emotion stimulus category and the emotion class.
We evaluate if these tasks require both modalities and find for the image-text relations, that text alone is sufficient for most categories.
The emotions of anger and sadness are best predicted with a multimodal model, while text alone is sufficient for disgust, joy, and surprise.
arXiv Detail & Related papers (2022-02-11T12:33:53Z) - Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not.
Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Enhancing Cognitive Models of Emotions with Representation Learning [58.2386408470585]
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions.
Our framework integrates a contextualized embedding encoder with a multi-head probing model.
Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions.
arXiv Detail & Related papers (2021-04-20T16:55:15Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.