Fugu-MT 論文翻訳(概要): Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration

論文の概要: Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration

arxiv url: http://arxiv.org/abs/2509.10704v1
Date: Fri, 12 Sep 2025 21:45:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:22.742159
Title: Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration
Title（参考訳）: Maestro: エージェントオーケストレーションによる自己改善型テキスト画像生成
Authors: Xingchen Wan, Han Zhou, Ruoxi Sun, Hootan Nakhost, Ke Jiang, Rajarishi Sinha, Sercan Ö. Arık,
Abstract要約: Maestroは、テキスト・ツー・イメージ(T2I)モデルのための新しい自己進化画像生成システムである。これにより、T2Iモデルはプロンプトの反復的進化を通じて、生成した画像を自律的に自己改善することができる。
参考スコア（独自算出の注目度）: 25.483518198712275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image (T2I) models, while offering immense creative potential, are highly reliant on human intervention, posing significant usability challenges that often necessitate manual, iterative prompt engineering over often underspecified prompts. This paper introduces Maestro, a novel self-evolving image generation system that enables T2I models to autonomously self-improve generated images through iterative evolution of prompts, using only an initial prompt. Maestro incorporates two key innovations: 1) self-critique, where specialized multimodal LLM (MLLM) agents act as 'critics' to identify weaknesses in generated images, correct for under-specification, and provide interpretable edit signals, which are then integrated by a 'verifier' agent while preserving user intent; and 2) self-evolution, utilizing MLLM-as-a-judge for head-to-head comparisons between iteratively generated images, eschewing problematic images, and evolving creative prompt candidates that align with user intents. Extensive experiments on complex T2I tasks using black-box models demonstrate that Maestro significantly improves image quality over initial prompts and state-of-the-art automated methods, with effectiveness scaling with more advanced MLLM components. This work presents a robust, interpretable, and effective pathway towards self-improving T2I generation.
Abstract（参考訳）: テキスト・ツー・イメージ(T2I)モデルは、創造的な潜在能力を提供する一方で、人間の介入に非常に依存しており、しばしば手動、反復的なプロンプト・エンジニアリングを必要とする重要なユーザビリティの課題を呈している。本稿では、初期プロンプトのみを用いて、プロンプトの反復的進化を通じて、T2Iモデルが生成した画像を自律的に自己改善することのできる、新しい自己進化型画像生成システムであるMaestroを紹介する。 Maestroは2つの重要なイノベーションを取り入れている。 1) 特殊なマルチモーダルLDM(MLLM)エージェントが生成画像の弱点を識別する「批判」として機能し、不特定性を補正し、解釈可能な編集信号を提供し、ユーザ意図を保ちながら「検証者」エージェントによって統合される自己批判。 2)MLLM-as-a-judgeを用いて,反復的に生成した画像の頭と頭の比較を行い,問題のある画像を抽出し,ユーザの意図に沿った創造的なプロンプト候補を進化させる。ブラックボックスモデルを用いた複雑なT2Iタスクの広範な実験により、Maestroは初期のプロンプトや最先端の自動化手法よりも画像品質を大幅に改善し、より高度なMLLMコンポーネントによる効率のスケーリングを実現している。この研究は、自己改善性T2I生成への堅牢で解釈可能で効果的な経路を示す。

論文の概要: Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration

関連論文リスト