Impressions: Understanding Visual Semiotics and Aesthetic Impact
- URL: http://arxiv.org/abs/2310.17887v1
- Date: Fri, 27 Oct 2023 04:30:18 GMT
- Title: Impressions: Understanding Visual Semiotics and Aesthetic Impact
- Authors: Julia Kruk, Caleb Ziems, Diyi Yang
- Abstract summary: We present Impressions, a novel dataset through which to investigate the semiotics of images.
We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images.
This dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
- Score: 66.40617566253404
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Is aesthetic impact different from beauty? Is visual salience a reflection of
its capacity for effective communication? We present Impressions, a novel
dataset through which to investigate the semiotics of images, and how specific
visual features and design choices can elicit specific emotions, thoughts and
beliefs. We posit that the impactfulness of an image extends beyond formal
definitions of aesthetics, to its success as a communicative act, where style
contributes as much to meaning formation as the subject matter. However, prior
image captioning datasets are not designed to empower state-of-the-art
architectures to model potential human impressions or interpretations of
images. To fill this gap, we design an annotation task heavily inspired by
image analysis techniques in the Visual Arts to collect 1,440 image-caption
pairs and 4,320 unique annotations exploring impact, pragmatic image
description, impressions, and aesthetic design choices. We show that existing
multimodal image captioning and conditional generation models struggle to
simulate plausible human responses to images. However, this dataset
significantly improves their ability to model impressions and aesthetic
evaluations of images through fine-tuning and few-shot adaptation.
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Make Me Happier: Evoking Emotions Through Image Diffusion Models [36.40067582639123]
We present a novel challenge of emotion-evoked image generation, aiming to synthesize images that evoke target emotions while retaining the semantics and structures of the original scenes.
Due to the lack of emotion editing datasets, we provide a unique dataset consisting of 340,000 pairs of images and their emotion annotations.
arXiv Detail & Related papers (2024-03-13T05:13:17Z) - StyleEDL: Style-Guided High-order Attention Network for Image Emotion
Distribution Learning [69.06749934902464]
We propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL.
StyleEDL interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents.
In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations.
arXiv Detail & Related papers (2023-08-06T03:22:46Z) - VILA: Learning Image Aesthetics from User Comments with Vision-Language
Pretraining [53.470662123170555]
We propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations.
Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels.
Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset.
arXiv Detail & Related papers (2023-03-24T23:57:28Z) - Contextually-rich human affect perception using multimodal scene
information [36.042369831043686]
We leverage pretrained vision-language (VLN) models to extract descriptions of foreground context from images.
We propose a multimodal context fusion (MCF) module to combine foreground cues with the visual scene and person-based contextual information for emotion prediction.
We show the effectiveness of our proposed modular design on two datasets associated with natural scenes and TV shows.
arXiv Detail & Related papers (2023-03-13T07:46:41Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - AffectGAN: Affect-Based Generative Art Driven by Semantics [2.323282558557423]
This paper introduces a novel method for generating artistic images that express particular affective states.
Our AffectGAN model is able to generate images based on specific or broad semantic prompts and intended affective outcomes.
A small dataset of 32 images generated by AffectGAN is annotated by 50 participants in terms of the particular emotion they elicit, as well as their quality and novelty.
arXiv Detail & Related papers (2021-09-30T04:53:25Z) - ArtEmis: Affective Language for Visual Art [46.643106054408285]
We focus on the affective experience triggered by visual artworks.
We ask the annotators to indicate the dominant emotion they feel for a given image.
This leads to a rich set of signals for both the objective content and the affective impact of an image.
arXiv Detail & Related papers (2021-01-19T01:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.