Fugu-MT 論文翻訳(概要): AdaGen: Learning Adaptive Policy for Image Synthesis

論文の概要: AdaGen: Learning Adaptive Policy for Image Synthesis

arxiv url: http://arxiv.org/abs/2603.06993v1
Date: Sat, 07 Mar 2026 02:33:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:13.588948
Title: AdaGen: Learning Adaptive Policy for Image Synthesis
Title（参考訳）: AdaGen: 画像合成のための適応ポリシーの学習
Authors: Zanlin Ni, Yulin Wang, Yeguo Hua, Renping Zhou, Jiayi Guo, Jun Song, Bo Zheng, Gao Huang,
Abstract要約: AdaGenは、反復生成プロセスをスケジューリングするための一般的な、学習可能な、そしてサンプル適応型のフレームワークである。 AdaGen は3倍の推論コストで DiT-XL の性能向上を実現し、VAR の FID を 1.92 から 1.59 に改善した。
参考スコア（独自算出の注目度）: 48.63446826766037
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in image synthesis have been propelled by powerful generative models, such as Masked Generative Transformers (MaskGIT), autoregressive models, diffusion models, and rectified flow models. A common principle behind their success is the decomposition of synthesis into multiple steps. However, this introduces a proliferation of step-specific parameters (e.g., noise level or temperature at each step). Existing approaches typically rely on manually-designed rules to manage this complexity, demanding expert knowledge and trial-and-error. Furthermore, these static schedules lack the flexibility to adapt to the unique characteristics of each sample, yielding sub-optimal performance. To address this issue, we present AdaGen, a general, learnable, and sample-adaptive framework for scheduling the iterative generation process. Specifically, we formulate the scheduling problem as a Markov Decision Process, where a lightweight policy network determines suitable parameters given the current generation state, and can be trained through reinforcement learning. Importantly, we demonstrate that simple reward designs, such as FID or pre-trained reward models, can be easily hacked and may not reliably guarantee the desired quality or diversity of generated samples. Therefore, we propose an adversarial reward design to guide the training of the policy networks. Finally, we introduce an inference-time refinement strategy and a controllable fidelity-diversity trade-off mechanism to further enhance the performance and flexibility of AdaGen. Comprehensive experiments on four generative paradigms validate the superiority of AdaGen. For example, AdaGen achieves better performance on DiT-XL with 3 times lower inference cost and improves the FID of VAR from 1.92 to 1.59 with negligible computational overhead.
Abstract（参考訳）: 画像合成の最近の進歩は、Masked Generative Transformers (MaskGIT)、自己回帰モデル、拡散モデル、修正フローモデルといった強力な生成モデルによって推進されている。彼らの成功の裏にある一般的な原理は、合成を複数のステップに分解することである。しかし、これはステップ固有のパラメータ(例えば、各ステップにおけるノイズレベルや温度)の拡散をもたらす。既存のアプローチは通常、この複雑さを管理するために手作業で設計されたルールに依存し、専門家の知識と試行錯誤を要求する。さらに、これらの静的スケジュールは、各サンプルのユニークな特性に適応する柔軟性に欠けており、準最適性能が得られる。この問題に対処するため、我々は反復生成プロセスのスケジューリングのための一般的な、学習可能な、サンプル適応型フレームワークであるAdaGenを紹介した。具体的には、スケジューリング問題をマルコフ決定プロセスとして定式化し、軽量なポリシーネットワークが現在の生成状態に与えられた適切なパラメータを決定し、強化学習を通じて訓練することができる。重要なことは、FIDや事前訓練された報酬モデルのような単純な報酬設計は容易にハックでき、生成したサンプルの望ましい品質や多様性を確実に保証できないことである。そこで本稿では,政策ネットワークのトレーニングを指導する対人報酬設計を提案する。最後に、AdaGenの性能と柔軟性をさらに高めるため、推論時間改善戦略と制御可能なフィデリティ・多様性トレードオフ機構を導入する。 4つの生成パラダイムに関する総合的な実験は、AdaGenの優位性を検証する。例えば、AdaGenは3倍の推論コストでDiT-XLの性能向上を実現し、VARのFIDを1.92から1.59に改善し、計算オーバーヘッドを無視できる。

論文の概要: AdaGen: Learning Adaptive Policy for Image Synthesis

関連論文リスト