Fugu-MT 論文翻訳(概要): Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity

論文の概要: Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity

arxiv url: http://arxiv.org/abs/2604.07402v1
Date: Wed, 08 Apr 2026 09:43:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.450595
Title: Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity
Title（参考訳）: 表現連続性を考慮した局所最適化による自己回帰映像生成モデルの高速化
Authors: Yucheng Zhou, Jianbing Shen,
Abstract要約: 本研究では,実証分析による自己回帰映像生成モデルの訓練を高速化する手法を検討する。その結果,少ないビデオフレームでのトレーニングではトレーニング時間が大幅に短縮される一方で,エラーの蓄積が悪化し,生成したビデオに矛盾が生じることが判明した。リプシッツ連続性に触発されて、生成されたビデオの一貫性を改善するためにRepresentation Continuity(ReCo)戦略を提案する。
参考スコア（独自算出の注目度）: 57.83511884904928
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autoregressive models have shown superior performance and efficiency in image generation, but remain constrained by high computational costs and prolonged training times in video generation. In this study, we explore methods to accelerate training for autoregressive video generation models through empirical analyses. Our results reveal that while training on fewer video frames significantly reduces training time, it also exacerbates error accumulation and introduces inconsistencies in the generated videos. To address these issues, we propose a Local Optimization (Local Opt.) method, which optimizes tokens within localized windows while leveraging contextual information to reduce error propagation. Inspired by Lipschitz continuity, we propose a Representation Continuity (ReCo) strategy to improve the consistency of generated videos. ReCo utilizes continuity loss to constrain representation changes, improving model robustness and reducing error accumulation. Extensive experiments on class- and text-to-video datasets demonstrate that our approach achieves superior performance to the baseline while halving the training cost without sacrificing quality.
Abstract（参考訳）: 自己回帰モデルは、画像生成において優れた性能と効率を示すが、高い計算コストとビデオ生成における長時間のトレーニング時間に制約されている。本研究では,実証分析による自己回帰映像生成モデルの訓練を高速化する手法を検討する。その結果,少ないビデオフレームでのトレーニングではトレーニング時間が大幅に短縮される一方で,エラーの蓄積が悪化し,生成したビデオに矛盾が生じることが判明した。これらの問題に対処するため,ローカルウィンドウ内のトークンを最適化し,コンテキスト情報を活用してエラーの伝搬を低減するローカル最適化(ローカルオプト)手法を提案する。リプシッツ連続性に触発されて、生成されたビデオの一貫性を改善するためにRepresentation Continuity(ReCo)戦略を提案する。 ReCoは連続損失を利用して表現変更を制約し、モデルの堅牢性を改善し、エラーの蓄積を減らす。クラスおよびテキスト・トゥ・ビデオデータセットの大規模な実験により,本手法は品質を犠牲にすることなくトレーニングコストを半減しながら,ベースラインよりも優れた性能を実現することが示された。

論文の概要: Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity

関連論文リスト