Fugu-MT 論文翻訳(概要): DVD: Deterministic Video Depth Estimation with Generative Priors

論文の概要: DVD: Deterministic Video Depth Estimation with Generative Priors

arxiv url: http://arxiv.org/abs/2603.12250v1
Date: Thu, 12 Mar 2026 17:58:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.289516
Title: DVD: Deterministic Video Depth Estimation with Generative Priors
Title（参考訳）: DVD: 生成優先順位による決定論的ビデオ深度推定
Authors: Hongfei Zhang, Harold Haodong Chen, Chenfei Liao, Jing He, Zixin Zhang, Haodong Li, Yihao Liang, Kanghao Chen, Bin Ren, Xu Zheng, Shuai Yang, Kun Zhou, Yinchuan Li, Nicu Sebe, Ying-Cong Chen,
Abstract要約: DVDは、事前訓練されたビデオ拡散モデルをシングルパス深度回帰器に適応させる最初のフレームワークである。 DVDは、最先端のゼロショットのパフォーマンスをベンチマークで達成する。私たちはパイプラインを完全にリリースし、オープンソースコミュニティに利益をもたらすために、SOTAビデオ深度推定のためのトレーニングスイート全体を提供しています。
参考スコア（独自算出の注目度）: 87.46576463137801
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing video depth estimation faces a fundamental trade-off: generative models suffer from stochastic geometric hallucinations and scale drift, while discriminative models demand massive labeled datasets to resolve semantic ambiguities. To break this impasse, we present DVD, the first framework to deterministically adapt pre-trained video diffusion models into single-pass depth regressors. Specifically, DVD features three core designs: (i) repurposing the diffusion timestep as a structural anchor to balance global stability with high-frequency details; (ii) latent manifold rectification (LMR) to mitigate regression-induced over-smoothing, enforcing differential constraints to restore sharp boundaries and coherent motion; and (iii) global affine coherence, an inherent property bounding inter-window divergence, which enables seamless long-video inference without requiring complex temporal alignment. Extensive experiments demonstrate that DVD achieves state-of-the-art zero-shot performance across benchmarks. Furthermore, DVD successfully unlocks the profound geometric priors implicit in video foundation models using 163x less task-specific data than leading baselines. Notably, we fully release our pipeline, providing the whole training suite for SOTA video depth estimation to benefit the open-source community.
Abstract（参考訳）: 生成モデルは確率的幾何学的幻覚とスケールドリフトに悩まされ、識別モデルは意味的曖昧さを解決するために大量のラベル付きデータセットを必要とする。そこで本研究では,事前学習した映像拡散モデルを単一パス深度回帰器に適応させる最初の枠組みであるDVDを提案する。特に、DVDには3つのコアデザインがある。一拡散タイムステップを構造的アンカーとして再生し、大域的安定性と高周波詳細とのバランスをとること。 (II)回帰誘起過平滑化を緩和し、鋭い境界とコヒーレントな動きを回復するための差分制約を強制する潜在多様体整流(LMR) (3) 風間分岐に縛られる固有の特性であるグローバルアフィンコヒーレンス(英語版)は、複雑な時間的アライメントを必要としないシームレスな長ビデオ推論を可能にする。大規模な実験により、DVDはベンチマーク全体で最先端のゼロショット性能を達成することが示された。さらに、DVDは、主要なベースラインよりも163倍少ないタスク固有データを使用して、ビデオ基礎モデルで暗黙的に深い幾何学的先行を解き放つことに成功した。特に、私たちはパイプラインを完全にリリースし、オープンソースコミュニティに利益をもたらすために、SOTAビデオ深度推定のためのトレーニングスイート全体を提供しています。

論文の概要: DVD: Deterministic Video Depth Estimation with Generative Priors

関連論文リスト