What Do AI-Generated Images Want?
- URL: http://arxiv.org/abs/2510.20350v2
- Date: Fri, 24 Oct 2025 09:41:05 GMT
- Title: What Do AI-Generated Images Want?
- Authors: Amanda Wasielewski,
- Abstract summary: I reframe W.J.T. Mitchell's question in light of contemporary AI image generation tools.<n>I argue that AI-generated images want specificity and concreteness because they are fundamentally abstract.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: W.J.T. Mitchell's influential essay 'What do pictures want?' shifts the theoretical focus away from the interpretative act of understanding pictures and from the motivations of the humans who create them to the possibility that the picture itself is an entity with agency and wants. In this article, I reframe Mitchell's question in light of contemporary AI image generation tools to ask: what do AI-generated images want? Drawing from art historical discourse on the nature of abstraction, I argue that AI-generated images want specificity and concreteness because they are fundamentally abstract. Multimodal text-to-image models, which are the primary subject of this article, are based on the premise that text and image are interchangeable or exchangeable tokens and that there is a commensurability between them, at least as represented mathematically in data. The user pipeline that sees textual input become visual output, however, obscures this representational regress and makes it seem like one form transforms into the other -- as if by magic.
Related papers
- The Iconicity of the Generated Image [22.154465616964256]
How humans interpret and produce images is influenced by the images we have been exposed to.<n>Visual generative AI models are exposed to many training images and learn to generate new images based on this.
arXiv Detail & Related papers (2025-09-19T23:59:43Z) - D-Judge: How Far Are We? Assessing the Discrepancies Between AI-synthesized and Natural Images through Multimodal Guidance [19.760989919485894]
We construct a large-scale multimodal dataset, D-ANI, comprising 5,000 natural images and over 440,000 AIGI samples.<n>We then introduce an AI-Natural Image Discrepancy assessment benchmark (D-Judge) to address the critical question: how far are AI-generated images (AIGIs) from truly realistic images?
arXiv Detail & Related papers (2024-12-23T15:08:08Z) - It's a Feature, Not a Bug: Measuring Creative Fluidity in Image Generators [5.639451539396458]
Our paper attempts to define and empirically measure one facet of creative behavior in AI, by conducting an experiment to quantify the "fluidity of prompt interpretation", or just "fluidity"
To study fluidity, we introduce a clear definition for it, (2) create chains of auto-generated prompts and images seeded with an initial "ground-truth: image", (3) measure these chains' breakage points using preexisting visual and semantic metrics, and (4) use both statistical tests and visual explanations to study these chains and determine whether the image generators used to produce them exhibit significant fluidity.
arXiv Detail & Related papers (2024-06-03T08:31:29Z) - Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images [67.18010640829682]
We show that AI-generated images introduce an invisible relevance bias to text-image retrieval models.
The inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias.
We propose an effective training method aimed at alleviating the invisible relevance bias.
arXiv Detail & Related papers (2023-11-23T16:22:58Z) - ITI-GEN: Inclusive Text-to-Image Generation [56.72212367905351]
This study investigates inclusive text-to-image generative models that generate images based on human-written prompts.
We show that, for some attributes, images can represent concepts more expressively than text.
We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
arXiv Detail & Related papers (2023-09-11T15:54:30Z) - AI-Generated Imagery: A New Era for the `Readymade' [0.7386189738262202]
This paper aims to examine how digital images produced by generative AI systems have come to be so regularly referred to as art.
We employ existing philosophical frameworks and theories of language to suggest that some AI-generated imagery can be presented as readymades' for consideration as art.
arXiv Detail & Related papers (2023-07-12T09:25:56Z) - Word-Level Explanations for Analyzing Bias in Text-to-Image Models [72.71184730702086]
Text-to-image (T2I) models can generate images that underrepresent minorities based on race and sex.
This paper investigates which word in the input prompt is responsible for bias in generated images.
arXiv Detail & Related papers (2023-06-03T21:39:07Z) - The Hidden Language of Diffusion Models [70.03691458189604]
We present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model.
We find surprising visual connections between concepts, that transcend their textual semantics.
We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings.
arXiv Detail & Related papers (2023-06-01T17:57:08Z) - Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of
Synthetic and Compositional Images [63.629345688220496]
We introduce WHOOPS!, a new dataset and benchmark for visual commonsense.
The dataset is comprised of purposefully commonsense-defying images created by designers.
Our results show that state-of-the-art models such as GPT3 and BLIP2 still lag behind human performance on WHOOPS!
arXiv Detail & Related papers (2023-03-13T16:49:43Z) - DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
Subject-Driven Generation [26.748667878221568]
We present a new approach for "personalization" of text-to-image models.
We fine-tune a pretrained text-to-image model to bind a unique identifier with that specific subject.
The unique identifier can then be used to synthesize fully photorealistic-novel images of the subject contextualized in different scenes.
arXiv Detail & Related papers (2022-08-25T17:45:49Z) - CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination [87.4797527628459]
We introduce a new task/dataset called Commonsense Reasoning for Counterfactual Scene Imagination (CoSIm)
CoSIm is designed to evaluate the ability of AI systems to reason about scene change imagination.
arXiv Detail & Related papers (2022-07-08T15:28:23Z) - A Taxonomy of Prompt Modifiers for Text-To-Image Generation [6.903929927172919]
This paper identifies six types of prompt modifier used by practitioners in the online community based on a 3-month ethnography study.
The novel taxonomy of prompt modifier provides researchers a conceptual starting point for investigating the practice of text-to-image generation.
We discuss research opportunities of this novel creative practice in the field of Human-Computer Interaction.
arXiv Detail & Related papers (2022-04-20T06:15:50Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.