Fugu-MT 論文翻訳(概要): GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

論文の概要: GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

arxiv url: http://arxiv.org/abs/2605.21605v2
Date: Fri, 22 May 2026 02:16:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 14:44:53.763296
Title: GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation
Title（参考訳）: GenEvolve: ツールによる視覚体験蒸留による自己進化型画像生成エージェント
Authors: Sixiang Chen, Zhaohu Xing, Tian Ye, Xinyu Geng, Yunlong Lin, Jianyu Lai, Xuanhua He, Fuxiang Zhai, Jialin Gao, Lei Zhu,
Abstract要約: 我々は、オープンエンド画像生成のための自己進化型フレームワークGenEvolveを提案する。 GenEvolveでは、各生成の試みはツール調整された軌道としてモデル化され、エージェントが証拠を収集し、参照を選択し、生成スキルを呼び出し、それらをプロンプト参照プログラムに構成する。オンラインの自己蒸留にインスパイアされた視覚体験蒸留(Visual Experience Distillation)は、密集したトークンレベルの監視を提供し、学生がより良い検索、知識の活性化、参照の選択、迅速な構築を支援する。
参考スコア（独自算出の注目度）: 27.909477662239215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open-ended image generation is no longer a simple prompt-to-image problem. High-quality generation often requires an agent to combine a model's internal generative ability with external resources. As requests become more diverse and demanding, we aim to develop a general image-generation agent that can self-evolve through trajectories and use tools more effectively across varied generation challenges. To this end, we propose GenEvolve, a self-evolving framework based on Tool-Orchestrated Visual Experience Distillation. In GenEvolve, each generation attempt is modeled as a tool-orchestrated trajectory, where the agent gathers evidence, selects references, invokes generation skills, and composes them into a prompt-reference program. Unlike existing agentic generation methods that mainly rely on image-level scalar rewards, GenEvolve compares multiple trajectories for the same request and abstracts best-worst differences into structured visual experience, provided only to a privileged teacher branch. Inspired by on-policy self-distillation, Visual Experience Distillation provides dense token-level supervision, helping the student internalize better search, knowledge activation, reference selection, and prompt construction. We further construct GenEvolve-Data and GenEvolve-Bench. Experiments on public benchmarks and GenEvolve-Bench show substantial gains over strong baselines, achieving state-of-the-art performance among current image-generation frameworks. Our website is as follows: https://ephemeral182.github.io/GenEvolve/
Abstract（参考訳）: オープンエンド画像生成は、もはや単純なプロンプト・ツー・イメージの問題ではない。高品質な生成は、モデルの内部生成能力と外部リソースを結合させるエージェントを必要とすることが多い。要求がより多様化し、要求されるようになるにつれて、私たちは、様々な世代課題にまたがって、軌道を通り、より効果的にツールを活用できる汎用画像生成エージェントの開発を目指しています。この目的のために,ツール・オーケストレート・ビジュアル・エクスペリエンス蒸留に基づく自己進化型フレームワークであるGenEvolveを提案する。 GenEvolveでは、各生成の試みはツール調整された軌道としてモデル化され、エージェントが証拠を収集し、参照を選択し、生成スキルを呼び出し、それらをプロンプト参照プログラムに構成する。イメージレベルのスカラー報酬に主に依存する既存のエージェント生成方法とは異なり、GenEvolveは同じ要求に対して複数のトラジェクトリを比較し、特権的な教師ブランチにのみ提供される、構造化された視覚的エクスペリエンスにベストプラクティスの違いを抽象化する。オンラインの自己蒸留にインスパイアされた視覚体験蒸留(Visual Experience Distillation)は、密集したトークンレベルの監視を提供し、学生がより良い検索、知識の活性化、参照の選択、迅速な構築を支援する。我々はさらにGenEvolve-DataとGenEvolve-Benchを構築します。公開ベンチマークとGenEvolve-Benchの実験では、強力なベースラインよりも大幅に向上し、現在の画像生成フレームワークで最先端のパフォーマンスを実現している。私たちのウェブサイトは以下の通りです。

論文の概要: GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

関連論文リスト