Fugu-MT 論文翻訳(概要): Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

論文の概要: Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

arxiv url: http://arxiv.org/abs/2510.08431v1
Date: Thu, 09 Oct 2025 16:45:30 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:15.20972
Title: Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
Title（参考訳）: Score-Regularized Continuous-Time Consistencyによる大規模拡散蒸留
Authors: Kaiwen Zheng, Yuji Wang, Qianli Ma, Huayu Chen, Jintao Zhang, Yogesh Balaji, Jianfei Chen, Ming-Yu Liu, Jun Zhu, Qinsheng Zhang,
Abstract要約: 連続時間一貫性モデル(sCM)は理論的に原理化され、学術規模の拡散を加速するために実証的に強力である。まず並列性互換なFlashAttention-2 JVPカーネルを開発し、100億以上のパラメータと高次元ビデオタスクを持つモデル上でsCMトレーニングを可能にする。本稿では, スコア蒸留を長軸正則化器として組み込んだスコア規則化連続時間一貫性モデル(rCM)を提案する。
参考スコア（独自算出の注目度）: 60.74505433956616
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work represents the first effort to scale up continuous-time consistency distillation to general application-level image and video diffusion models. Although continuous-time consistency model (sCM) is theoretically principled and empirically powerful for accelerating academic-scale diffusion, its applicability to large-scale text-to-image and video tasks remains unclear due to infrastructure challenges in Jacobian-vector product (JVP) computation and the limitations of standard evaluation benchmarks. We first develop a parallelism-compatible FlashAttention-2 JVP kernel, enabling sCM training on models with over 10 billion parameters and high-dimensional video tasks. Our investigation reveals fundamental quality limitations of sCM in fine-detail generation, which we attribute to error accumulation and the "mode-covering" nature of its forward-divergence objective. To remedy this, we propose the score-regularized continuous-time consistency model (rCM), which incorporates score distillation as a long-skip regularizer. This integration complements sCM with the "mode-seeking" reverse divergence, effectively improving visual quality while maintaining high generation diversity. Validated on large-scale models (Cosmos-Predict2, Wan2.1) up to 14B parameters and 5-second videos, rCM matches or surpasses the state-of-the-art distillation method DMD2 on quality metrics while offering notable advantages in diversity, all without GAN tuning or extensive hyperparameter searches. The distilled models generate high-fidelity samples in only $1\sim4$ steps, accelerating diffusion sampling by $15\times\sim50\times$. These results position rCM as a practical and theoretically grounded framework for advancing large-scale diffusion distillation.
Abstract（参考訳）: この研究は、一般的なアプリケーションレベルの画像およびビデオ拡散モデルに連続時間一貫性蒸留を拡大する最初の試みである。連続時間一貫性モデル(sCM)は理論上は理論上は学術的拡散の促進に有効であるが,ヤコビアンベクトル製品(JVP)計算のインフラ問題や標準評価ベンチマークの限界のため,大規模テキスト・ツー・イメージやビデオタスクへの適用性は未定である。まず並列性互換なFlashAttention-2 JVPカーネルを開発し、100億以上のパラメータと高次元ビデオタスクを持つモデル上でsCMトレーニングを可能にする。本研究は, 細部生成におけるsCMの基本的な品質限界を明らかにするものである。これを解決するために, スコア蒸留を長軸正則化器として組み込んだスコア規則化連続時間整合モデル (rCM) を提案する。この統合は、sCMと「モード探索」逆のばらつきを補完し、高世代多様性を維持しながら視覚的品質を効果的に改善する。大規模モデル(Cosmos-Predict2, Wan2.1)で14Bパラメータと5秒ビデオまで検証されたrCMは、品質指標で最先端の蒸留法DMD2と一致または超える。蒸留されたモデルは、高忠実度サンプルをわずか1\sim4$のステップで生成し、拡散サンプリングを15\times\sim50\times$で加速させる。これらの結果から, rCMは大規模拡散蒸留を推し進めるための実用的, 理論的に基礎的な枠組みとして位置づけられた。

論文の概要: Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

関連論文リスト