Fugu-MT 論文翻訳(概要): Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation

論文の概要: Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation

arxiv url: http://arxiv.org/abs/2606.18478v2
Date: Tue, 23 Jun 2026 08:04:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.224778
Title: Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation
Title（参考訳）: データ強制蒸留:フットステップビデオ生成における多様性と忠実さの回復
Authors: Siyi Chen, Shaowei Liu, Yixuan Jia, Zian Wang, Huan Ling, Qing Qu, Jun Gao,
Abstract要約: 単行のコード変更だけでMDDの多様性と忠実性を回復する単純なポストトレーニングフレームワークであるData-Forcing Distillation (DFD)を提案する。我々は,我々のフレームワークの詳細な理論的分析を行い,テキスト・ツー・ビデオ,画像・ツー・ビデオ,自動回帰ビデオ生成に対する我々のアプローチを検証する。
参考スコア（独自算出の注目度）: 25.352409052792122
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress has shown promise in distilling multi-step video diffusion models into efficient few-step students. Among them, Distribution Matching Distillation (DMD) and its successor DMD2 achieved strong generation quality and fast convergence. However, due to the nature of the reverse Kullback--Leibler (KL) objective, these methods exhibit two persistent failure modes: a substantial drop in sample diversity, and visibly over-saturated outputs that deviate from real-video appearance. In this work, we propose Data-Forcing Distillation (DFD), a simple post-training framework that restores diversity and fidelity in DMD with only a single-line of code change. At its core is the teacher score discrepancy to guide the student toward the real-data distribution, pulling it to missing modes (mitigating mode collapse) and away from problematic modes absent in real data (avoiding over-saturation). We provide an in-depth theoretical analysis of our framework and validate our approach on text-to-video, image-to-video, and autoregressive video generation. With only 100--300 steps of finetuning, DFD effectively restores diversity and fidelity on both Wan2.1-1.3B and Cosmos-Predict2.5-2B model, resolving the over-saturation artifacts with significantly better video dynamics and appearance, and even outperforms the teacher model.
Abstract（参考訳）: 近年の進歩は、多段階ビデオ拡散モデルを効率の良い数段階の学生に蒸留する可能性を示している。その中でも、DMD(Distributed Matching Distillation)とその後継DMD2は、強力な生成品質と高速収束を実現した。しかしながら、KL(Kulback--Leibler)の逆の目的のため、これらの手法は2つの持続的な障害モードを示す: サンプルの多様性の大幅な低下と、実映像の外観から逸脱する過飽和出力である。本研究では,単行のコード変更だけでMDDの多様性と忠実さを回復するシンプルなポストトレーニングフレームワークであるData-Forcing Distillation (DFD)を提案する。その中核は、教師のスコアの相違によって、生徒が実際のデータ分布に向かって誘導し、それを欠落モード(緩和モード崩壊)に引きずり出し、実際のデータに欠けている問題モード(過飽和)から遠ざかる。我々は,我々のフレームワークの詳細な理論的分析を行い,テキスト・ツー・ビデオ,画像・ツー・ビデオ,自動回帰ビデオ生成に対する我々のアプローチを検証する。 100-300段階の微調整で、DFDはWan2.1-1.3BモデルとCosmos-Predict2.5-2Bモデルの両方の多様性と忠実さを効果的に回復し、ビデオのダイナミクスと外観を著しく改善し、教師モデルよりも優れている。

論文の概要: Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation

関連論文リスト