Affection: Learning Affective Explanations for Real-World Visual Data
- URL: http://arxiv.org/abs/2210.01946v1
- Date: Tue, 4 Oct 2022 22:44:17 GMT
- Title: Affection: Learning Affective Explanations for Real-World Visual Data
- Authors: Panos Achlioptas, Maks Ovsjanikov, Leonidas Guibas and Sergey Tulyakov
- Abstract summary: We introduce and share with the research community a large-scale dataset that contains emotional reactions and free-form textual explanations for 85,007 publicly available images.
We show that there is significant common ground to capture potentially plausible emotional responses with a large support in the subject population.
Our work paves the way for richer, more human-centric, and emotionally-aware image analysis systems.
- Score: 50.28825017427716
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we explore the emotional reactions that real-world images tend
to induce by using natural language as the medium to express the rationale
behind an affective response to a given visual stimulus. To embark on this
journey, we introduce and share with the research community a large-scale
dataset that contains emotional reactions and free-form textual explanations
for 85,007 publicly available images, analyzed by 6,283 annotators who were
asked to indicate and explain how and why they felt in a particular way when
observing a specific image, producing a total of 526,749 responses. Even though
emotional reactions are subjective and sensitive to context (personal mood,
social status, past experiences) - we show that there is significant common
ground to capture potentially plausible emotional responses with a large
support in the subject population. In light of this crucial observation, we ask
the following questions: i) Can we develop multi-modal neural networks that
provide reasonable affective responses to real-world visual data, explained
with language? ii) Can we steer such methods towards producing explanations
with varying degrees of pragmatic language or justifying different emotional
reactions while adapting to the underlying visual stimulus? Finally, iii) How
can we evaluate the performance of such methods for this novel task? With this
work, we take the first steps in addressing all of these questions, thus paving
the way for richer, more human-centric, and emotionally-aware image analysis
systems. Our introduced dataset and all developed methods are available on
https://affective-explanations.org
Related papers
- Contextual Emotion Recognition using Large Vision Language Models [0.6749750044497732]
Achieving human-level recognition of the apparent emotion of a person in real world situations remains an unsolved task in computer vision.
In this paper, we examine two major approaches enabled by recent large vision language models.
We demonstrate that a vision language model, fine-tuned even on a small dataset, can significantly outperform traditional baselines.
arXiv Detail & Related papers (2024-05-14T23:24:12Z) - Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning [0.6749750044497732]
We propose methods to incorporate the emotional reasoning capabilities by constructing "narrative captions" relevant to emotion perception.
We propose two distinct ways to construct these captions using zero-shot classifiers (CLIP) and fine-tuning visual-language models (LLaVA) over human generated descriptors.
Our experiments showed that combining the "Fast" narrative descriptors and "Slow" reasoning of language models is a promising way to achieve emotional theory of mind.
arXiv Detail & Related papers (2023-10-30T20:26:12Z) - High-Level Context Representation for Emotion Recognition in Images [4.987022981158291]
We propose an approach for high-level context representation extraction from images.
The model relies on a single cue and a single encoding stream to correlate this representation with emotions.
Our approach is more efficient than previous models and can be easily deployed to address real-world problems related to emotion recognition.
arXiv Detail & Related papers (2023-05-05T13:20:41Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network [83.27291945217424]
We propose a novel Scene-Object interreLated Visual Emotion Reasoning network (SOLVER) to predict emotions from images.
To mine the emotional relationships between distinct objects, we first build up an Emotion Graph based on semantic concepts and visual features.
We also design a Scene-Object Fusion Module to integrate scenes and objects, which exploits scene features to guide the fusion process of object features with the proposed scene-based attention mechanism.
arXiv Detail & Related papers (2021-10-24T02:41:41Z) - Stimuli-Aware Visual Emotion Analysis [75.68305830514007]
We propose a stimuli-aware visual emotion analysis (VEA) method consisting of three stages, namely stimuli selection, feature extraction and emotion prediction.
To the best of our knowledge, it is the first time to introduce stimuli selection process into VEA in an end-to-end network.
Experiments demonstrate that the proposed method consistently outperforms the state-of-the-art approaches on four public visual emotion datasets.
arXiv Detail & Related papers (2021-09-04T08:14:52Z) - Affective Image Content Analysis: Two Decades Review and New
Perspectives [132.889649256384]
We will comprehensively review the development of affective image content analysis (AICA) in the recent two decades.
We will focus on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence.
We discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.
arXiv Detail & Related papers (2021-06-30T15:20:56Z) - Computational Emotion Analysis From Images: Recent Advances and Future
Directions [79.05003998727103]
In this chapter, we aim to introduce image emotion analysis (IEA) from a computational perspective.
We begin with commonly used emotion representation models from psychology.
We then define the key computational problems that the researchers have been trying to solve.
arXiv Detail & Related papers (2021-03-19T13:33:34Z) - ArtEmis: Affective Language for Visual Art [46.643106054408285]
We focus on the affective experience triggered by visual artworks.
We ask the annotators to indicate the dominant emotion they feel for a given image.
This leads to a rich set of signals for both the objective content and the affective impact of an image.
arXiv Detail & Related papers (2021-01-19T01:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.