Related papers: Adaptive Prompt Elicitation for Text-to-Image Generation

Adaptive Prompt Elicitation for Text-to-Image Generation

URL: http://arxiv.org/abs/2602.04713v1
Date: Wed, 04 Feb 2026 16:24:46 GMT
Title: Adaptive Prompt Elicitation for Text-to-Image Generation
Authors: Xinyi Wen, Lena Hegemann, Xiaofu Jin, Shuai Ma, Antti Oulasvirta,
Abstract summary: APE represents latent intent as interpretable feature requirements using language model priors.<n>A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead.
Score: 31.242444699785697
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.

Related papers

VisualPrompter: Prompt Optimization with Visual Feedback for Text-to-Image Synthesis [15.392482488365955]
VisualPrompter is a training-free prompt engineering framework that refines user inputs to model-preferred sentences.<n>Our framework achieves new state-of-the-art performance on multiple benchmarks for text-image alignment evaluation.
arXiv Detail & Related papers (2025-06-29T08:24:39Z)
Creating General User Models from Computer Use [53.59999173952482]
This paper presents an architecture for a general user model (GUM) that learns about you by observing any interaction you have with your computer.<n>The GUM takes as input any unstructured observation of a user (e.g., device screenshots) and constructs confidence-weighted propositions that capture user knowledge and preferences.
arXiv Detail & Related papers (2025-05-16T04:00:31Z)
Enhancing Intent Understanding for Ambiguous prompt: A Human-Machine Co-Adaption Strategy [50.714983524814606]
Current image generation systems produce high-quality images but struggle with ambiguous user prompts.<n>We propose a human-machine co-adaption strategy using mutual information between the user's prompts and the pictures under modification.
arXiv Detail & Related papers (2025-01-25T10:32:00Z)
Taming Text-to-Image Synthesis for Novices: User-centric Prompt Generation via Multi-turn Guidance [24.432762962671614]
DialPrompt is a dialogue-based TIS prompt generation model that emphasizes user experience for novice users.<n>To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset.<n>Experiments indicate that DialPrompt improves significantly in user-centricity score compared with existing approaches.
arXiv Detail & Related papers (2024-08-23T08:35:35Z)
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER) IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose. We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z)
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance [62.15866177242207]
We show that through constructing a subject-agnostic condition, one could obtain outputs consistent with both the given subject and input text prompts. Our approach is conceptually simple and requires only minimal code modifications, but leads to substantial quality improvements.
arXiv Detail & Related papers (2024-05-02T15:03:41Z)
Dynamic Prompt Optimizing for Text-to-Image Generation [63.775458908172176]
We introduce the textbfPrompt textbfAuto-textbfEditing (PAE) method to improve text-to-image generative models. We employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts.
arXiv Detail & Related papers (2024-04-05T13:44:39Z)
A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis [33.71897211776133]
Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. It is challenging for novice users to achieve the desired results by manually entering prompts. We propose a novel framework that automatically translates user-input prompts into model-preferred prompts.
arXiv Detail & Related papers (2024-02-20T06:58:49Z)
Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models [29.057923932305123]
We present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
arXiv Detail & Related papers (2023-04-18T22:59:11Z)
TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA) In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.