Fugu-MT 論文翻訳(概要): Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

論文の概要: Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

arxiv url: http://arxiv.org/abs/2604.03118v1
Date: Fri, 03 Apr 2026 15:43:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.515835
Title: Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
Title（参考訳）: 塩:高速ビデオ生成のためのキャッシュ・アウェア・トレーニングと自己整合性分布マッチング
Authors: Xingtong Ge, Yi Zhang, Yushi Huang, Dailan He, Xiahong Wang, Bingqi Ma, Guanglu Song, Yu Liu, Jun Zhang,
Abstract要約: 軌道式整合蒸留(DMD)は、鋭いモード探索サンプルを回収することができるが、その局所的な訓練信号は、時間経過で更新がどのように構成されるかを明確に定めていない。本稿では,連続的なデノナイジング更新の終端一致合成を明示的に規則化する自己持続分布マッチング蒸留(SC-DMD)を提案する。
参考スコア（独自算出の注目度）: 27.698320788533405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distilling video generation models to extremely low inference budgets (e.g., 2--4 NFEs) is crucial for real-time deployment, yet remains challenging. Trajectory-style consistency distillation often becomes conservative under complex video dynamics, yielding an over-smoothed appearance and weak motion. Distribution matching distillation (DMD) can recover sharp, mode-seeking samples, but its local training signals do not explicitly regularize how denoising updates compose across timesteps, making composed rollouts prone to drift. To overcome this challenge, we propose Self-Consistent Distribution Matching Distillation (SC-DMD), which explicitly regularizes the endpoint-consistent composition of consecutive denoising updates. For real-time autoregressive video generation, we further treat the KV cache as a quality parameterized condition and propose Cache-Distribution-Aware training. This training scheme applies SC-DMD over multi-step rollouts and introduces a cache-conditioned feature alignment objective that steers low-quality outputs toward high-quality references. Across extensive experiments on both non-autoregressive backbones (e.g., Wan~2.1) and autoregressive real-time paradigms (e.g., Self Forcing), our method, dubbed \textbf{Salt}, consistently improves low-NFE video generation quality while remaining compatible with diverse KV-cache memory mechanisms. Source code will be released at \href{https://github.com/XingtongGe/Salt}{https://github.com/XingtongGe/Salt}.
Abstract（参考訳）: ビデオ生成モデルを非常に低い推論予算(例:2--4 NFE)に拡張することは、リアルタイムデプロイメントには不可欠だが、それでも難しい。トラジェクトリスタイルの整合蒸留は、複雑なビデオ力学の下では保守的になり、過度に滑らかな外観と弱い動きをもたらす。分布整合蒸留(DMD)は、鋭いモード探索サンプルを回収することができるが、その局所的な訓練信号は、更新が時間経過でどのように構成されるかを明確に定めておらず、構成されたロールアウトはドリフトしがちである。この課題を克服するために,連続的なデノナイジング更新の終端一致構成を明示的に正規化する自己持続分布マッチング蒸留(SC-DMD)を提案する。リアルタイム自動回帰ビデオ生成では、KVキャッシュを品質パラメータ化条件として扱い、キャッシュ・ディストリビューション・アウェアトレーニングを提案する。 SC-DMDをマルチステップロールアウトに応用し、低品質な出力を高品質な参照に向けて制御するキャッシュ条件の特徴アライメント目的を導入する。非自己回帰的バックボーン(例, Wan~2.1)と自己回帰的リアルタイムパラダイム(例, Self Forcing)の両方について広範な実験を行った結果,本手法は,多様なKV-cacheメモリ機構との互換性を維持しつつ,低NFEビデオ生成品質を継続的に改善する。ソースコードは \href{https://github.com/XingtongGe/Salt}{https://github.com/XingtongGe/Salt} で公開される。

論文の概要: Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

関連論文リスト