Related papers: GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation

GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation

URL: http://arxiv.org/abs/2510.07217v1
Date: Wed, 08 Oct 2025 16:51:52 GMT
Title: GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
Authors: Wen Ye, Zhaocheng Liu, Yuwei Gui, Tingyu Yuan, Yunyue Su, Bowen Fang, Chaoyang Zhao, Qiang Liu, Liang Wang,
Abstract summary: We propose a test-time prompt optimization strategy that operates directly on the input text.<n>Our approach is model-agnostic, interpretable, and well-suited for handling long and complex prompts.
Score: 13.197958581564256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image synthesis has made remarkable progress, yet accurately interpreting complex and lengthy prompts remains challenging, often resulting in semantic inconsistencies and missing details. Existing solutions, such as fine-tuning, are model-specific and require training, while prior automatic prompt optimization (APO) approaches typically lack systematic error analysis and refinement strategies, resulting in limited reliability and effectiveness. Meanwhile, test-time scaling methods operate on fixed prompts and on noise or sample numbers, limiting their interpretability and adaptability. To solve these, we introduce a flexible and efficient test-time prompt optimization strategy that operates directly on the input text. We propose a plug-and-play multi-agent system called GenPilot, integrating error analysis, clustering-based adaptive exploration, fine-grained verification, and a memory module for iterative optimization. Our approach is model-agnostic, interpretable, and well-suited for handling long and complex prompts. Simultaneously, we summarize the common patterns of errors and the refinement strategy, offering more experience and encouraging further exploration. Experiments on DPG-bench and Geneval with improvements of up to 16.9% and 5.7% demonstrate the strong capability of our methods in enhancing the text and image consistency and structural coherence of generated images, revealing the effectiveness of our test-time prompt optimization strategy. The code is available at https://github.com/27yw/GenPilot.

Related papers

SCOPE: Prompt Evolution for Enhancing Agent Effectiveness [53.75986399936395]
Large Language Model (LLM) agents are increasingly deployed in environments that generate massive, dynamic contexts.<n>While agents have access to this context, their static prompts lack the mechanisms to manage it effectively.<n>We introduce textbfSCOPE (Self-evolving Context Optimization via Prompt Evolution)<n>We propose a Dual-Stream mechanism that balances tactical specificity (resolving immediate errors) with strategic generality (evolving long-term principles)
arXiv Detail & Related papers (2025-12-17T12:25:05Z)
ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation [49.01601313084479]
ImAgent is a training-free unified multimodal agent that integrates reasoning, generation, and self-evaluation.<n>Experiments on image generation and editing tasks demonstrate that ImAgent consistently improves over the backbone.
arXiv Detail & Related papers (2025-11-14T17:00:29Z)
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph [42.247964605609745]
Test-Time Scaling (TTS) improves large language models (LLMs) by allocating additional computation during inference.<n>We formalize it as a multi-LLM collaboration graph, where nodes encode roles and model assignments, edges capture information flow.<n>We propose Agent-REINFORCE, an LLM-agent-augmented framework that mirrors the REINFORCE pipeline by mapping sampling-gradient-update to sampling-feedback-update.
arXiv Detail & Related papers (2025-10-29T22:14:25Z)
Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis [97.37770785712475]
We present a generation-based debiasing framework for object detection.<n>Our method significantly narrows the performance gap for underrepresented object groups.
arXiv Detail & Related papers (2025-10-21T02:19:12Z)
ThinkFake: Reasoning in Multimodal Large Language Models for AI-Generated Image Detection [51.93101033997245]
Increasing realism of AI-generated images has raised serious concerns about misinformation and privacy violations.<n>We propose ThinkFake, a novel reasoning-based and generalizable framework for AI-generated image detection.<n>We show that ThinkFake outperforms state-of-the-art methods on the GenImage benchmark and demonstrates strong zero-shot generalization on the challenging LOKI benchmark.
arXiv Detail & Related papers (2025-09-24T07:34:09Z)
DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models [60.713908578319256]
We propose Direct Discrepancy Learning (DDL) to optimize the detector with task-oriented knowledge.<n>Built upon this, we introduce DetectAnyLLM, a unified detection framework that achieves state-of-the-art MGTD performance.<n>MIRAGE samples human-written texts from 10 corpora across 5 text-domains, which are then re-generated or revised using 17 cutting-edge LLMs.
arXiv Detail & Related papers (2025-09-15T10:59:57Z)
Reward-Agnostic Prompt Optimization for Text-to-Image Diffusion Models [20.292872255460534]
We introduce RATTPO, a flexible test-time optimization method applicable across various reward scenarios without modification.<n>RATTPO searches for optimized prompts by querying large language models (LLMs) textitwithout requiring reward-specific task descriptions.<n> Empirical results demonstrate the versatility of RATTPO, effectively enhancing user prompts across diverse reward setups.
arXiv Detail & Related papers (2025-06-20T09:02:05Z)
Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach [15.658579092368981]
Large Language Models (LLMs) increasingly rely on automatic prompt engineering in graphical user interfaces (GUIs) to refine user inputs and enhance response accuracy.<n>We propose the Adaptive Greedy Binary Search (AGBS) method, which simulates common prompt optimization mechanisms while preserving semantic stability.
arXiv Detail & Related papers (2025-05-26T15:41:06Z)
On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control [52.405085773954596]
Retrieval-Augmented Generation has emerged as a powerful approach to mitigate large language model hallucinations.<n>Existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving.<n>We introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off.
arXiv Detail & Related papers (2025-02-17T18:56:20Z)
In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z)
Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism [33.116006446428756]
We study multi-agent online learning problems in the presence of delays and asynchronicities. We derive adaptive learning strategies with optimal regret bounds, at both the agent and network levels.
arXiv Detail & Related papers (2020-12-21T18:55:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.