Fugu-MT 論文翻訳(概要): Infinite Gaze Generation for Videos with Autoregressive Diffusion

論文の概要: Infinite Gaze Generation for Videos with Autoregressive Diffusion

arxiv url: http://arxiv.org/abs/2603.24938v1
Date: Thu, 26 Mar 2026 02:02:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:48.047476
Title: Infinite Gaze Generation for Videos with Autoregressive Diffusion
Title（参考訳）: 自己回帰拡散映像の無限注視生成
Authors: Jenna Kang, Colin Groth, Tong Wu, Finley Torrens, Patsorn Sangkloy, Gordon Wetzstein, Qi Sun,
Abstract要約: 任意の長さのビデオにおいて、無限水平視線予測のための生成フレームワークを提案する。自己回帰拡散モデルを用いて、連続的な空間座標と高分解能タイムスタンプを特徴とする視線軌跡を合成する。
参考スコア（独自算出の注目度）: 37.82819999198602
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting human gaze in video is fundamental to advancing scene understanding and multimodal interaction. While traditional saliency maps provide spatial probability distributions and scanpaths offer ordered fixations, both abstractions often collapse the fine-grained temporal dynamics of raw gaze. Furthermore, existing models are typically constrained to short-term windows ($\approx$ 3-5s), failing to capture the long-range behavioral dependencies inherent in real-world content. We present a generative framework for infinite-horizon raw gaze prediction in videos of arbitrary length. By leveraging an autoregressive diffusion model, we synthesize gaze trajectories characterized by continuous spatial coordinates and high-resolution timestamps. Our model is conditioned on a saliency-aware visual latent space. Quantitative and qualitative evaluations demonstrate that our approach significantly outperforms existing approaches in long-range spatio-temporal accuracy and trajectory realism.
Abstract（参考訳）: 映像における人間の視線予測は、シーン理解とマルチモーダルインタラクションの促進に不可欠である。伝統的なサリエンシマップは空間確率分布を提供し、スキャンパスは順序づけられた固定を提供するが、どちらの抽象化も生の目視の微細な時間的ダイナミクスを崩壊させることが多い。さらに、既存のモデルは、通常、短期的なウィンドウ($3-5s)に制約される。任意の長さのビデオにおいて、無限水平視線予測のための生成フレームワークを提案する。自己回帰拡散モデルを用いて、連続的な空間座標と高分解能タイムスタンプを特徴とする視線軌跡を合成する。当社のモデルは、サリエンシを意識した視覚的潜伏空間に設定されている。定量的および定性的な評価は,提案手法が長期時空間精度と軌跡リアリズムにおいて既存手法を著しく上回っていることを示す。

論文の概要: Infinite Gaze Generation for Videos with Autoregressive Diffusion

関連論文リスト