Fugu-MT 論文翻訳(概要): coDrawAgents: A Multi-Agent Dialogue Framework for Compositional Image Generation

論文の概要: coDrawAgents: A Multi-Agent Dialogue Framework for Compositional Image Generation

arxiv url: http://arxiv.org/abs/2603.12829v1
Date: Fri, 13 Mar 2026 09:32:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:12.027328
Title: coDrawAgents: A Multi-Agent Dialogue Framework for Compositional Image Generation
Title（参考訳）: coDrawAgents: 合成画像生成のための多言語対話フレームワーク
Authors: Chunhan Li, Qifeng Wu, Jia-Hui Pan, Ka-Hei Hui, Jingyu Hu, Yuming Jiang, Bin Sheng, Xihui Liu, Wenjuan Gong, Zhengzhe Liu,
Abstract要約: 対話型多エージェント対話フレームワークであるcoDrawAgentsを提案する。インタプリタは、直接テキスト・ツー・イメージ・パスとレイアウト対応マルチエージェント・プロセスとを判定する。 Plannerは、進化する視覚的コンテキストで決定を下しながら、同じセマンティック優先レベルを持つオブジェクトのレイアウトを提案する。 Checkerは空間的一貫性と属性アライメントを検証することで、明示的なエラー訂正機構を導入する。 Painterはイメージをステップごとに合成し、新しく計画されたオブジェクトをキャンバスに組み込んで、その後のイテレーションに対してよりリッチなコンテキストを提供する。
参考スコア（独自算出の注目度）: 48.027946344020314
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Text-to-image generation has advanced rapidly, but existing models still struggle with faithfully composing multiple objects and preserving their attributes in complex scenes. We propose coDrawAgents, an interactive multi-agent dialogue framework with four specialized agents: Interpreter, Planner, Checker, and Painter that collaborate to improve compositional generation. The Interpreter adaptively decides between a direct text-to-image pathway and a layout-aware multi-agent process. In the layout-aware mode, it parses the prompt into attribute-rich object descriptors, ranks them by semantic salience, and groups objects with the same semantic priority level for joint generation. Guided by the Interpreter, the Planner adopts a divide-and-conquer strategy, incrementally proposing layouts for objects with the same semantic priority level while grounding decisions in the evolving visual context of the canvas. The Checker introduces an explicit error-correction mechanism by validating spatial consistency and attribute alignment, and refining layouts before they are rendered. Finally, the Painter synthesizes the image step by step, incorporating newly planned objects into the canvas to provide richer context for subsequent iterations. Together, these agents address three key challenges: reducing layout complexity, grounding planning in visual context, and enabling explicit error correction. Extensive experiments on benchmarks GenEval and DPG-Bench demonstrate that coDrawAgents substantially improves text-image alignment, spatial accuracy, and attribute binding compared to existing methods.
Abstract（参考訳）: テキスト・ツー・イメージ生成は急速に進歩しているが、既存のモデルは、複数のオブジェクトを忠実に構成し、複雑なシーンでそれらの属性を保存することに苦戦している。対話型多エージェント対話フレームワークであるcoDrawAgentsを提案する。インタプリタは、ダイレクトテキスト・ツー・イメージ・パスとレイアウト対応マルチエージェント・プロセスとを適応的に決定する。レイアウト対応モードでは、プロンプトを属性リッチなオブジェクト記述子に解析し、セマンティックなサリエンスでランク付けし、共同生成のための同じセマンティックな優先度のオブジェクトをグループ化する。インタプリタによってガイドされたPlannerでは、オブジェクトのレイアウトを同じセマンティック優先度レベルで漸進的に提案し、キャンバスの進化する視覚的コンテキストにおいて決定を下すという、分割/参照戦略を採用している。 Checkerは空間的一貫性と属性アライメントを検証することで明示的なエラー訂正機構を導入し、レンダリング前にレイアウトを精査する。最後に、Papererはイメージをステップごとに合成し、新しく計画されたオブジェクトをキャンバスに組み込んで、その後のイテレーションでよりリッチなコンテキストを提供する。これらのエージェントは、レイアウトの複雑さを減らし、視覚的なコンテキストで計画を立て、明示的なエラー修正を可能にする、という3つの重要な課題に対処する。 GenEval と DPG-Bench のベンチマーク実験により、coDrawAgents は既存の手法に比べてテキスト画像のアライメント、空間精度、属性バインディングを大幅に改善することを示した。

論文の概要: coDrawAgents: A Multi-Agent Dialogue Framework for Compositional Image Generation

関連論文リスト