Related papers: A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks

A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks

URL: http://arxiv.org/abs/2504.20340v1
Date: Tue, 29 Apr 2025 01:21:16 GMT
Title: A Picture is Worth a Thousand Prompts? Efficacy of Iterative Human-Driven Prompt Refinement in Image Regeneration Tasks
Authors: Khoi Trinh, Scott Seidenberger, Raveen Wijewickrama, Murtuza Jadliwala, Anindya Maiti,
Abstract summary: The creation of AI-generated images often involves refining the input prompt iteratively to achieve desired visual outcomes.<n>This study focuses on the relatively underexplored concept of image regeneration using AI.<n>We present a structured user study evaluating how iterative prompt refinement affects the similarity of regenerated images relative to their targets.
Score: 1.8563642867160601
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With AI-generated content becoming ubiquitous across the web, social media, and other digital platforms, it is vital to examine how such content are inspired and generated. The creation of AI-generated images often involves refining the input prompt iteratively to achieve desired visual outcomes. This study focuses on the relatively underexplored concept of image regeneration using AI, in which a human operator attempts to closely recreate a specific target image by iteratively refining their prompt. Image regeneration is distinct from normal image generation, which lacks any predefined visual reference. A separate challenge lies in determining whether existing image similarity metrics (ISMs) can provide reliable, objective feedback in iterative workflows, given that we do not fully understand if subjective human judgments of similarity align with these metrics. Consequently, we must first validate their alignment with human perception before assessing their potential as a feedback mechanism in the iterative prompt refinement process. To address these research gaps, we present a structured user study evaluating how iterative prompt refinement affects the similarity of regenerated images relative to their targets, while also examining whether ISMs capture the same improvements perceived by human observers. Our findings suggest that incremental prompt adjustments substantially improve alignment, verified through both subjective evaluations and quantitative measures, underscoring the broader potential of iterative workflows to enhance generative AI content creation across various application domains.

Related papers

GenIR: Generative Visual Feedback for Mental Image Retrieval [6.813922846074993]
We study the task of Mental Image Retrieval (MIR)<n>MIR targets the realistic yet underexplored setting where users refine their search for a mentally envisioned image through multi-round interactions with an image search engine.<n>We propose GenIR, a generative multi-round retrieval paradigm leveraging diffusion-based image generation to explicitly reify the AI system's understanding at each round.
arXiv Detail & Related papers (2025-06-06T16:28:03Z)
RAISE: Realness Assessment for Image Synthesis and Evaluation [3.7619101673213664]
We develop and train models on RAISE to establish baselines for realness prediction.<n>Our experimental results demonstrate that features derived from deep foundation vision models can effectively capture the subjective realness.
arXiv Detail & Related papers (2025-05-25T17:14:43Z)
A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation.<n> Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity.<n>This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z)
An Image-like Diffusion Method for Human-Object Interaction Detection [13.951650101149188]
The output of HOI detection for each human-object pair can be recast as an image.<n>In HOI-IDiff, we tackle HOI detection from a novel perspective, using an Image-like Diffusion process to generate HOI detection outputs as images.
arXiv Detail & Related papers (2025-03-23T16:30:16Z)
Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias [52.590072198551944]
The aim of image personalization is to create images based on a user-provided subject. Current methods face challenges in ensuring fidelity to the text prompt. We introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images.
arXiv Detail & Related papers (2025-03-09T14:14:02Z)
Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent [9.748808189341526]
An effective Text-to-Image (T2I) evaluation metric should accomplish the following: detect instances where the generated images do not align with the textual prompts. We propose a method based on large language models (LLMs) for conducting question-answering with an extracted scene-graph and created a dataset with human-rated scores for generated images.
arXiv Detail & Related papers (2024-12-07T18:44:38Z)
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models. We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals. Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z)
When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability. We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks. Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z)
Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis [3.783530340696776]
This study proposes a Multi-Agent framework to optimize input prompts for text-to-image generation models. A professional prompts database serves as a benchmark to guide the instruction modifier towards generating high-caliber prompts. Preliminary ablation studies highlight the effectiveness of various system components and suggest areas for future improvements.
arXiv Detail & Related papers (2024-06-13T00:33:29Z)
Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability [21.355484227864466]
We investigate the relationship between the representation space and input space around generated images. We introduce a new metric to evaluating image-generative models called anomaly score (AS)
arXiv Detail & Related papers (2023-12-17T07:33:06Z)
Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images [67.18010640829682]
We show that AI-generated images introduce an invisible relevance bias to text-image retrieval models. The inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. We propose an effective training method aimed at alleviating the invisible relevance bias.
arXiv Detail & Related papers (2023-11-23T16:22:58Z)
Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation [96.74302670358145]
We introduce an automated method for Visual Concept Evaluation (ViCE) to assess consistency between a generated/edited image and the corresponding prompt/instructions. ViCE combines the strengths of Large Language Models (LLMs) and Visual Question Answering (VQA) into a unified pipeline, aiming to replicate the human cognitive process in quality assessment.
arXiv Detail & Related papers (2023-07-18T16:33:30Z)
Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.