Fugu-MT 論文翻訳(概要): TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment

論文の概要: TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment

arxiv url: http://arxiv.org/abs/2606.13035v1
Date: Thu, 11 Jun 2026 08:16:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.665897
Title: TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment
Title（参考訳）: TetherCache: Gated RecallとTrusted Alignmentによる自動回帰長ビデオ生成の安定化
Authors: Yu Meng, Xiangyang Luo, Letian Li, Wenyuan Jiang, Chen Gao, Xinlei Chen, Yong Li, Xiao-Ping Zhang,
Abstract要約: ドリフト耐性長ビデオ生成のためのトレーニングフリーでプラグアンドプレイのキャッシュ管理戦略であるTetherCacheを提案する。 Gated Recall with Attention-Diversity Balancingは、ゲートスコアを使用して長距離メモリフレームを選択する。 TAMEは、信頼されたコンテキスト分布に統計を合わせることで、新しくリコールされたメモリトークンを軽量に編集する。
参考スコア（独自算出の注目度）: 51.33418612284208
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autoregressive video diffusion models provide a natural formulation for streaming and variable-length video generation by conditioning newly generated frames on previously generated content. However, extending these models to minute-level generation remains challenging: the limited KV-cache budget prevents the model from retaining the full history, while repeatedly conditioning on self-generated frames induces a context distribution shift that accumulates over time, leading to visual artifacts, quality degradation, and temporal drift. In this paper, we propose TetherCache, a training-free and plug-and-play cache management strategy for drift-resistant long video generation. TetherCache organizes the cache into sink, memory, and recent regions, and introduces two complementary mechanisms. First, GRAB (Gated Recall with Attention-Diversity Balancing) selects long-range memory frames using a gated score that combines attention-based relevance with temporal diversity, preserving informative yet diverse historical context under a fixed cache budget. Second, TAME (Trusted Alignment via Memory Editing) lightly edits newly recalled memory tokens by aligning their statistics to a trusted context distribution, reducing the pollution caused by drifted historical features. Built on Self-Forcing, TetherCache consistently improves long-video generation quality on VBench-Long across 30s, 60s, and 240s settings. In particular, for 240s generation, it substantially improves overall and semantic scores while reducing quality drift from 7.84 to 1.33, demonstrating its effectiveness for stable long-horizon autoregressive video diffusion.
Abstract（参考訳）: 自己回帰ビデオ拡散モデルは、以前に生成されたコンテンツに新たに生成されたフレームを条件付けすることで、ストリーミングおよび可変長ビデオ生成のための自然な定式化を提供する。限られたKVキャッシュ予算は、モデルが完全な履歴を保持するのを防ぐ一方で、自己生成フレームに繰り返し条件を付けることで、時間とともに蓄積されるコンテキスト分散シフトを誘発し、視覚的アーティファクト、品質劣化、時間的ドリフトにつながる。本稿では,ドリフト抵抗長ビデオ生成のためのトレーニングフリーでプラグアンドプレイのキャッシュ管理戦略であるTetherCacheを提案する。 TetherCacheはキャッシュをシンク、メモリ、最近のリージョンに整理し、2つの補完メカニズムを導入している。まず、GRAB(Gated Recall with Attention-Diversity Balancing)は、注意に基づく関連性と時間的多様性を組み合わせ、固定キャッシュ予算の下で情報的かつ多様な歴史的コンテキストを保存するゲートスコアを用いて、長距離メモリフレームを選択する。第2に、TAME(Trusted Alignment via Memory Editing)は、その統計情報を信頼できるコンテキスト分布に整列させることで、新しいメモリトークンを軽量に編集し、漂流した歴史的特徴による汚染を減らす。 Self-Forcing上に構築されたTetherCacheは、30s、60s、240s設定でVBench-Longの長時間ビデオ生成品質を継続的に改善する。特に240世代では、画質のドリフトを7.84から1.33に減らし、全体的なスコアとセマンティックスコアを大幅に改善し、安定した長距離自己回帰ビデオ拡散の有効性を示した。

論文の概要: TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment

関連論文リスト