Fugu-MT 論文翻訳(概要): Future Forcing: Future-aware Training-free KV Cache Policy for Autoregressive Video Generation

論文の概要: Future Forcing: Future-aware Training-free KV Cache Policy for Autoregressive Video Generation

arxiv url: http://arxiv.org/abs/2605.30083v1
Date: Thu, 28 May 2026 15:30:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.425742
Title: Future Forcing: Future-aware Training-free KV Cache Policy for Autoregressive Video Generation
Title（参考訳）: 今後の動向: 自動回帰ビデオ生成のための訓練不要KVキャッシュポリシー
Authors: Jiayi Luo, Qiyan Liu, Tengyang Wang, JunHao Liu, Jiayu Chen, Cong Wang, Hanxin Zhu, Chen Gao, Xiaobin Hu, Qingyun Sun, Zhibo Chen,
Abstract要約: オートレグレッシブ(AR)ビデオ生成は,長距離ビデオ合成において有望なパラダイムとして浮上している。 KVキャッシュ圧縮法は、重要とされるビデオトークンのみを選択的に保持することでこの問題を軽減する。本稿では,ARビデオ生成のためのトレーニング不要な将来対応KVキャッシュポリシであるFuture Forcingを提案する。
参考スコア（独自算出の注目度）: 28.305030932725884
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive (AR) video generation has emerged as a promising paradigm for long-horizon video synthesis, where each frame is generated conditioned on previously generated tokens. To accelerate inference, the KV cache is used to avoid redundant recomputation across generation steps. Nevertheless, its growth with generation length introduces increasing memory and error accumulation, limiting the scalability of AR models to even longer sequences. Existing KV cache compression methods mitigate this issue by selectively retaining only video tokens deemed important. However, most existing methods assess token importance using short-horizon signals derived from the current or historical generation context, making these methods prone to overlooking tokens that appear unimportant at early steps but later become critical for future frames. In this work, we identify an important property of trained AR video models: although RoPE-modulated queries evolve across autoregressive steps, the underlying canonical pre-RoPE query distribution remains remarkably stable throughout the video generation process. This approximate stationarity implies that future query distributions are estimable from historical statistics, enabling principled future-aware cache decisions without any additional training. Building on this insight, we propose Future Forcing, a training-free future-aware KV cache policy for AR video generation. Specifically, Future Forcing first constructs a future query proxy from historical statistics, then scores KV cache tokens by their importance under this proxy, and finally merges redundant token pairs within the affine subspace induced by the future query. Extensive experiments show that Future Forcing improves long-horizon consistency under limited KV caches, achieving up to 1.49 improvement in subject consistency on VBench-Long for 60s generation over existing AR video KV cache policies.
Abstract（参考訳）: 自己回帰(AR)ビデオ生成は,以前に生成されたトークン上で各フレームを条件付きで生成する長軸ビデオ合成において,有望なパラダイムとして登場した。推論を高速化するために、KVキャッシュは生成ステップ間の冗長な再計算を避けるために使用される。それでも、生成長によるその成長は、メモリとエラーの蓄積の増加をもたらし、ARモデルのスケーラビリティをさらに長いシーケンスに制限する。既存のKVキャッシュ圧縮手法は、重要とされるビデオトークンのみを選択的に保持することでこの問題を軽減する。しかし、既存のほとんどの手法は、現在または歴史的に発生した短水平信号を用いてトークンの重要性を評価するため、初期の段階では重要でないように見えるトークンを見渡す傾向にあるが、後に将来のフレームにとって重要なものとなる。本研究では,訓練されたARビデオモデルの重要な特性を同定する: RoPE変調クエリは自己回帰段階にわたって進化するが,その基礎となる正準前RoPEクエリ分布は,ビデオ生成プロセスを通じて著しく安定している。この近似定常性は、将来のクエリ分布が過去の統計から推定可能であることを示唆し、追加のトレーニングなしで、原則として将来のキャッシュ決定を可能にする。この知見に基づいて,ARビデオ生成のためのトレーニング不要な将来のKVキャッシュポリシであるFuture Forcingを提案する。具体的には、Future Forcingはまず、ヒストリカル統計から将来のクエリプロキシを構築し、次に、このプロキシの下で重要なKVキャッシュトークンをスコアし、最後に、将来のクエリによって誘導されるアフィンサブスペース内で冗長なトークンペアをマージする。大規模な実験により、Future Forcingは限られたKVキャッシュ下での長時間の一貫性を改善し、既存のARビデオKVキャッシュポリシよりも60世代でVBench-Longの被写体一貫性を最大1.49改善することが示された。

論文の概要: Future Forcing: Future-aware Training-free KV Cache Policy for Autoregressive Video Generation

関連論文リスト