Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction
- URL: http://arxiv.org/abs/2403.05576v1
- Date: Tue, 27 Feb 2024 01:16:55 GMT
- Title: Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction
- Authors: Senjuti Dutta, Sherol Chen, Sunny Mak, Amnah Ahmad, Katherine Collins, Alena Butryna, Deepak Ramachandran, Krishnamurthy Dvijotham, Ellie Pavlick, Ravi Rajakumar,
- Abstract summary: Image generation models are poised to become ubiquitous in a range of applications.
These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard.
To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use cases.
- Score: 21.00784031928471
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use cases. Simulating the effects of ordinarily latent elements of annotators subjectivity, we contrive a set of motivations (t-shirt graphics, presentation visuals, and phone background images) to contextualize a set of crowdsourcing tasks. Our results show that human evaluations of images vary within individual contexts and across combinations of contexts. Three key factors affecting this subjectivity are image appearance, image alignment with text, and representation of objects mentioned in the text. Our study highlights the importance of taking individual users and contexts into account, both when building and evaluating generative models
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Impressions: Understanding Visual Semiotics and Aesthetic Impact [66.40617566253404]
We present Impressions, a novel dataset through which to investigate the semiotics of images.
We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images.
This dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
arXiv Detail & Related papers (2023-10-27T04:30:18Z) - Composition and Deformance: Measuring Imageability with a Text-to-Image
Model [8.008504325316327]
We propose methods that use generated images to measure the imageability of single English words and connected text.
We find high correlation between the proposed computational measures of imageability and human judgments of individual words.
We discuss possible effects of model training and implications for the study of compositionality in text-to-image models.
arXiv Detail & Related papers (2023-06-05T18:22:23Z) - Affect-Conditioned Image Generation [0.9668407688201357]
We introduce a method for generating images conditioned on desired affect, quantified using a psychometrically validated three-component approach.
We first train a neural network for estimating the affect content of text and images from semantic embeddings, and then demonstrate how this can be used to exert control over a variety of generative models.
arXiv Detail & Related papers (2023-02-20T03:44:04Z) - Quantitative analysis of visual representation of sign elements in
COVID-19 context [2.9409535911474967]
We propose using computer analysis to perform a quantitative analysis of the elements used in the visual creations produced in reference to the epidemic.
The images compiled in The Covid Art Museum's Instagram account to analyze the different elements used to represent subjective experiences with regard to a global event.
This research reveals that the elements that are repeated in images to create narratives and the relations of association that are established in the sample.
arXiv Detail & Related papers (2021-12-15T15:54:53Z) - Automatic Main Character Recognition for Photographic Studies [78.88882860340797]
Main characters in images are the most important humans that catch the viewer's attention upon first look.
Identifying the main character in images plays an important role in traditional photographic studies and media analysis.
We propose a method for identifying the main characters using machine learning based human pose estimation.
arXiv Detail & Related papers (2021-06-16T18:14:45Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.