Fugu-MT 論文翻訳(概要): MemoGen: Can Past Experience Improve Future Text-to-Image Generation?

論文の概要: MemoGen: Can Past Experience Improve Future Text-to-Image Generation?

arxiv url: http://arxiv.org/abs/2606.03243v1
Date: Tue, 02 Jun 2026 07:04:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.821489
Title: MemoGen: Can Past Experience Improve Future Text-to-Image Generation?
Title（参考訳）: MemoGen:過去の経験は将来のテキスト・画像生成を改善できるか?
Authors: Wenshuo Chen, Kuimou Yu, Bowen Tian, Jianfei Song, Shaofeng Liang, Haozhe Jia, Kan Cheng, Haosen Li, Kaishen Yuan, Lei Wang, Jiemin Wu, Songning Lai, Yutao Yue,
Abstract要約: 我々は,既存の画像生成装置をエージェント進化層で拡張する,トレーニング不要のフレームワークであるMemoGenを提案する。各タスクに対して、MemoGenは視覚的要件を明示的に推測し、必要に応じて外部エビデンスと参照を取得し、それらを実行可能な生成制約に変換する。進化ラウンド全体を通して、エージェントは関連する経験を取得し、同様の世代の改善を行い、失敗したケースを選択的に修復し、成功したケースを保存する。
参考スコア（独自算出の注目度）: 10.461937938760842
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Modern text-to-image models have achieved strong visual synthesis, yet remain unreliable when prompts require implicit visual constraints, relational reasoning, or external knowledge. Existing retrieval-augmented and agentic generation methods mitigate this issue by acquiring external knowledge, references, or refined prompts for the current request, yet they typically treat each generation as an isolated episode and do not systematically preserve past successes or failures for future use. In this work, we ask whether a text-to-image system can continually improve from its own generation experience without updating the underlying generator. We propose MemoGen, a training-free framework that augments existing image generators with an agentic evolution layer. For each task, MemoGen explicitly infers visual requirements, retrieves external evidence and references when necessary, translates them into executable generation constraints, evaluates the generated result, and stores task understanding, reference choices, visual feedback, successful strategies, and failure lessons as reusable experience memory. Across evolution rounds, the agent retrieves relevant experience to improve similar future generations, selectively repairing previously failed cases while preserving successful ones, thereby enabling test-time self-evolution without parameter updates. Extensive experiments on knowledge-intensive and reasoning-oriented benchmarks demonstrate the effectiveness of this paradigm: after only two evolution rounds, MemoGen built upon the open-source Qwen-Image backbone surpasses strong proprietary systems such as Nano Banana Pro and GPT-Image-1 on WISE and Mind-Bench, showing that explicit experience memory can serve as a powerful continual learning signal for reliable text-to-image generation.
Abstract（参考訳）: 現代のテキスト・ツー・イメージモデルでは強い視覚合成が達成されているが、プロンプトが暗黙の視覚的制約、関係的推論、あるいは外部知識を必要とする場合、信頼できないままである。既存の検索強化およびエージェント生成手法は、現在の要求に対する外部の知識、参照、あるいは洗練されたプロンプトを取得することでこの問題を軽減するが、通常は各世代を独立したエピソードとして扱い、将来の使用のために過去の成功や失敗を体系的に保存しない。本研究では,テキスト・ツー・イメージ・システムが,基礎となるジェネレータを更新することなく,その生成体験から継続的に改善できるかどうかを問う。我々は,既存の画像生成装置をエージェント進化層で拡張する,トレーニング不要のフレームワークであるMemoGenを提案する。各タスクに対して、MemoGenは視覚的要件を明示的に推論し、必要に応じて外部エビデンスと参照を検索し、それらを実行可能な生成制約に変換し、生成された結果を評価し、タスク理解、参照選択、視覚的フィードバック、成功戦略、障害教訓を再利用可能なエクスペリエンスメモリとして格納する。進化ラウンド全体で、エージェントは関連する経験を取得して、同様の世代の改善を行い、成功したケースを保存しながら、以前に失敗したケースを選択的に修復し、パラメータを更新せずにテスト時の自己進化を可能にする。知識集約と推論指向のベンチマークに関する大規模な実験は、このパラダイムの有効性を実証している。2回の進化ラウンドの後、オープンソースのQwen-Imageバックボーン上に構築されたMemoGenは、WISEやMind-BenchのNano Banana ProやGPT-Image-1のような強力なプロプライエタリなシステムを超え、明示的な体験記憶が、信頼できるテキスト-画像生成のための強力な連続的な学習信号として機能することを示した。

論文の概要: MemoGen: Can Past Experience Improve Future Text-to-Image Generation?

関連論文リスト