Fugu-MT 論文翻訳(概要): UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation

論文の概要: UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation

arxiv url: http://arxiv.org/abs/2606.18702v1
Date: Wed, 17 Jun 2026 05:27:19 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 17:16:51.017555
Title: UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation
Title（参考訳）: UniTemp:双方向蒸留で時間順にビデオ生成をアンロック
Authors: Lin Zhang, Sicheng Mo, Zefan Cai, Jinhong Lin, Zihao Lin, Jiuxiang Gu, Krishna Kumar Singh, Yuheng Li, Yin Li,
Abstract要約: 任意の時間方向に生成をサポートする自己回帰モデルを訓練する。 UniTempは双方向蒸留フレームワークであり、あらゆる方向のビデオ生成のために単一の自己回帰的な学生モデルを訓練する。実験により、UniTempはフォワードオンリーの手法と比較して、短いビデオ生成と長いビデオ生成の競合性能を維持していることが示された。
参考スコア（独自算出の注目度）: 48.72984575797994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive video diffusion models have emerged as a promising approach for long video generation, achieving strong performance in streaming settings. However, existing methods are restricted to forward temporal generation, whereas practical video creation often requires flexible generation order, e.g., conditioning on future context to extend backward, or on both past and future context for inbetween generation. We bridge this gap by training an autoregressive model that supports generation in arbitrary temporal directions. A key technical challenge arises from the Causal 3D VAE widely used in video diffusion models, which encodes latents strictly conditioned on past context. While suited for forward generation, this causal structure causes inter-block discontinuities when generation proceeds backward. To address this, we introduce blockwise anchor latents, a set of auxiliary latents that restore the missing past context at block boundaries during backward generation. Built on this design, we propose UniTemp, a bidirectional distillation framework that trains a single autoregressive student model for any-direction video generation. At inference time, UniTemp conditions on arbitrary past and/or future frames, improving controllability for both bidirectional and inbetween generation. Experiments show that UniTemp maintains competitive performance on short and long video generation compared to forward-only methods, while enabling diverse workflows such as bidirectional video extension, inbetween generation, looping video generation, scene transition, and visual story generation. Project website: https://lzhangbj.github.io/projects/unitemp/
Abstract（参考訳）: 自動回帰ビデオ拡散モデルは、ストリーミング設定において強力なパフォーマンスを達成するために、長いビデオ生成において有望なアプローチとして現れてきた。しかし、既存の手法は前向きの時間的生成に制限されているのに対し、実用的なビデオ生成では、後向きに拡張するために将来のコンテキストを条件付けしたり、過去と未来の両方でインベント・ジェネレーションを行うのに柔軟な生成順序を必要とする場合が多い。任意の時間方向に生成をサポートする自己回帰モデルを訓練することで、このギャップを埋める。ビデオ拡散モデルで広く使われているCausal 3D VAEは、過去の文脈で厳格に条件付けられた潜伏者をエンコードする。フォワード生成に適しているが、この因果構造は、生成が後方に進むとブロック間不連続を引き起こす。これを解決するためにブロックワイドアンカー潜水器を導入する。これは、後方生成時にブロック境界で失った過去のコンテキストを復元する補助潜水器の集合である。この設計に基づいて, 双方向蒸留フレームワークUniTempを提案する。推定時において、UniTempは任意の過去および/または将来のフレーム上で条件を定め、双方向および中間生成の制御性を向上させる。実験により、UniTempは、フォワードオンリーの手法と比較して、ショートビデオ生成とロングビデオ生成の競争性能を保ちながら、双方向ビデオ拡張、インベントウェア生成、ループビデオ生成、シーン遷移、ビジュアルストーリー生成といった多様なワークフローを実現していることが示された。プロジェクトWebサイト: https://lzhangbj.github.io/projects/unitemp/

論文の概要: UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation

関連論文リスト