Fugu-MT 論文翻訳(概要): High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

論文の概要: High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

arxiv url: http://arxiv.org/abs/2606.12575v1
Date: Wed, 10 Jun 2026 18:24:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.400487
Title: High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation
Title（参考訳）: 教師対応エンド・ツー・エンド蒸留による高忠実2ステップ画像生成
Authors: Dongyang Liu, Ruoyi Du, David Liu, Dengyang Jiang, Liangchen Li, Qilong Wu, Zhen Li, Steven C. H. Hoi, Hongsheng Li, Peng Gao,
Abstract要約: 4-8段階の拡散蒸留は徐々に成熟していったが、さらに2段階に推し進めるのは難しい。 8ステップのZ画像ターボ教師から抽出した高品質な2ステップ画像生成モデルであるZ画像Turbo++を紹介する。
参考スコア（独自算出の注目度）: 66.19462276967869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-step diffusion distillation has become increasingly mature for 4-8-step generation, yet pushing further to 2 steps remains challenging. In this work, we introduce Z-Image Turbo++, a high-quality 2-step image generation model distilled from the 8-step Z-Image Turbo teacher. Our method addresses the central bottlenecks of increased task difficulty and limited model capacity in 2-step generation through three simple but effective design choices tailored to this regime. First, we propose Distribution-Aligned Adversarial Learning, which uses teacher-generated images rather than external real images as real samples for GAN training, providing a more attainable and informative adversarial target. Second, we adopt Step-Decoupled Parameterization, assigning independent model parameters to the two denoising steps to better match their distinct capacity demands. Third, we perform End-to-End Training with Iterative Regularization, allowing the first step to receive gradients from final image quality while preserving a meaningful intermediate generation through an explicit step-1 loss. Together, these designs substantially narrow the quality gap between 2-step and 8-step generation in both qualitative and quantitative evaluations, highlighting the potential of carefully tailored distillation strategies for improving the quality-efficiency trade-off in few-step generation.
Abstract（参考訳）: 4-8段階の拡散蒸留は徐々に成熟していったが、さらに2段階に推し進めるのは難しい。本研究では,8ステップのZ画像ターボ教師から抽出した高品質な2ステップ画像生成モデルであるZ画像Turbo++を紹介する。本手法は,2段階生成におけるタスク難易度の増加とモデル容量の制限という中心的ボトルネックに対処する。まず, GANトレーニングの実際のサンプルとして, 外部画像ではなく教師生成画像を用いて, より達成可能な, 有意義な対人目標を提供する分散適応型対人学習を提案する。第二に、独立したモデルパラメータを2つのデノナイズステップに割り当てて、それぞれのキャパシティ要求に合うようにします。第3に、反復正規化によるエンド・ツー・エンドトレーニングを行い、第1ステップは最終的な画像品質から勾配を受信し、第1ステップは、明示的なステップ-1の損失を通じて有意義な中間生成を保存する。これらの設計は質的・定量的評価において, 2段階と8段階の間の品質ギャップを著しく狭め, 数段階における品質効率のトレードオフを改善するため, 慎重に調整された蒸留戦略の可能性を強調した。

論文の概要: High-Fidelity Two-Step Image Generation via Teacher-Aligned End-to-End Distillation

関連論文リスト