PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization
- URL: http://arxiv.org/abs/2509.12446v2
- Date: Wed, 24 Sep 2025 03:51:42 GMT
- Title: PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization
- Authors: Dawei Xiang, Wenyan Xu, Kexin Chu, Tianqi Ding, Zixu Shen, Yiming Zeng, Jianchang Su, Wei Zhang,
- Abstract summary: To generate high-quality images, users must craft detailed prompts specifying scene, style, and context.<n>We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process.<n>Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt.
- Score: 4.133498001057646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of generative AI has democratized access to powerful tools such as Text-to-Image models. However, to generate high-quality images, users must still craft detailed prompts specifying scene, style, and context-often through multiple rounds of refinement. We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process. Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt. By leveraging Chain-of-Thought reasoning, our framework effectively infers hidden context and enriches scene and background details. To iteratively refine the prompt, a self-evaluation agent aligns the modified prompt with the original input, while a feedback-tuning agent incorporates user feedback for further refinement. Experimental results demonstrate that PromptSculptor significantly enhances output quality and reduces the number of iterations needed for user satisfaction. Moreover, its model-agnostic design allows seamless integration with various T2I models, paving the way for industrial applications.
Related papers
- ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation [49.01601313084479]
ImAgent is a training-free unified multimodal agent that integrates reasoning, generation, and self-evaluation.<n>Experiments on image generation and editing tasks demonstrate that ImAgent consistently improves over the backbone.
arXiv Detail & Related papers (2025-11-14T17:00:29Z) - VisualPrompter: Prompt Optimization with Visual Feedback for Text-to-Image Synthesis [15.392482488365955]
VisualPrompter is a training-free prompt engineering framework that refines user inputs to model-preferred sentences.<n>Our framework achieves new state-of-the-art performance on multiple benchmarks for text-image alignment evaluation.
arXiv Detail & Related papers (2025-06-29T08:24:39Z) - RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning [88.14234949860105]
RePrompt is a novel reprompting framework that introduces explicit reasoning into the prompt enhancement process via reinforcement learning.<n>Our approach enables end-to-end training without human-annotated data.
arXiv Detail & Related papers (2025-05-23T06:44:26Z) - Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation [61.31036260686349]
We propose a novel prompt optimization framework, designed to rephrase a simple user prompt into a sophisticated prompt to a text-to-image model.<n> Specifically, we employ the large vision language models (LVLMs) as the solver to rewrite the user prompt, and concurrently, employ LVLMs as a reward model to score the aesthetics and alignment of the images generated by the optimized prompt.<n>Instead of laborious human feedback, we exploit the prior knowledge of the LVLM to provide rewards, i.e., AI feedback.
arXiv Detail & Related papers (2025-05-22T15:05:07Z) - Enhancing Intent Understanding for Ambiguous prompt: A Human-Machine Co-Adaption Strategy [30.344943584233466]
We propose a human-machine co-adaption strategy using mutual information between the user's prompts and the pictures under modification.<n>We find that an improved model can reduce the necessity for multiple rounds of adjustments.
arXiv Detail & Related papers (2025-01-25T10:32:00Z) - What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance [23.411806572667707]
Text-to-image synthesis (TIS) models heavily rely on the quality and specificity of textual prompts.
Existing solutions relieve this via automatic model-preferred prompt generation from user queries.
We propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity.
arXiv Detail & Related papers (2024-08-23T08:35:35Z) - Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis [3.783530340696776]
This study proposes a Multi-Agent framework to optimize input prompts for text-to-image generation models.
A professional prompts database serves as a benchmark to guide the instruction modifier towards generating high-caliber prompts.
Preliminary ablation studies highlight the effectiveness of various system components and suggest areas for future improvements.
arXiv Detail & Related papers (2024-06-13T00:33:29Z) - A User-Friendly Framework for Generating Model-Preferred Prompts in
Text-to-Image Synthesis [33.71897211776133]
Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images.
It is challenging for novice users to achieve the desired results by manually entering prompts.
We propose a novel framework that automatically translates user-input prompts into model-preferred prompts.
arXiv Detail & Related papers (2024-02-20T06:58:49Z) - Divide and Conquer: Language Models can Plan and Self-Correct for
Compositional Text-to-Image Generation [72.6168579583414]
CompAgent is a training-free approach for compositional text-to-image generation with a large language model (LLM) agent as its core.
Our approach achieves more than 10% improvement on T2I-CompBench, a comprehensive benchmark for open-world compositional T2I generation.
arXiv Detail & Related papers (2024-01-28T16:18:39Z) - Prompt Expansion for Adaptive Text-to-Image Generation [51.67811570987088]
This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort.
The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts.
We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods.
arXiv Detail & Related papers (2023-12-27T21:12:21Z) - BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image
Synthesis [14.852061933308276]
We propose BeautifulPrompt, a deep generative model to produce high-quality prompts from very simple raw descriptions.
In our work, we first fine-tuned the BeautifulPrompt model over low-quality and high-quality collecting prompt pairs.
We further showcase the integration of BeautifulPrompt to a cloud-native AI platform to provide better text-to-image generation service.
arXiv Detail & Related papers (2023-11-12T06:39:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.