Fugu-MT 論文翻訳(概要): DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images

論文の概要: DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images

arxiv url: http://arxiv.org/abs/2606.12368v2
Date: Thu, 11 Jun 2026 05:24:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 13:39:59.695135
Title: DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images
Title（参考訳）: DepthMaster:パースペクティブとパノラマ画像のための統一された単眼深度推定
Authors: Pengfei Wang, Shihao Wang, Liyi Chen, Zhiyuan Ma, Guowen Zhang, Lei Zhang,
Abstract要約: 統合されたメートル法深度推定フレームワークであるDepthMasterを紹介する。パノラマ画像を重なり合う視点パッチに分解することで問題を再構築する。 DepthMasterは、13の多様なデータセットで最先端のゼロショットのパフォーマンスを達成する。
参考スコア（独自算出の注目度）: 20.087102336395173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While monocular depth estimation has achieved significant progress, achieving generalized metric depth estimation for both narrow field-of-view (FoV) perspectives and $360^\circ$ panoramas remains an unsolved challenge. Existing methods are often tailored to specific camera types and struggle to produce accurate metric depth that generalizes across diverse settings. This limitation stems from two key challenges: the inherent geometric discrepancy between perspective and panoramic cameras, and the scarcity of panoramic training data with metric annotations. In this work, we introduce DepthMaster, a unified metric depth estimation framework. Rather than employing specialized networks to learn spherical distortions, we reformulate the problem by decomposing panoramic images into overlapping perspective patches. Crucially, distinct from prior projection-based methods that rely on ad-hoc architectural modifications to handle boundaries, we introduce a novel Correspondence Consistency Loss (CCL) and inject virtual projection cameras as geometric priors, allowing us to seamlessly stitch the patches while avoiding specialized operators and keeping the backbone largely compatible with standard Transformer designs. This strategy also resolves the geometric differences by unifying all inputs into a canonical perspective representation, and effectively circumvents data scarcity by directly unlocking powerful metric priors from vast perspective datasets. Trained on a mixed dataset that contains only one panorama dataset, DepthMaster achieves state-of-the-art zero-shot performance on 13 diverse datasets, outperforming not only universal methods but also leading specialist models in both perspective and panoramic domains.
Abstract（参考訳）: 単分子深度推定は大きな進歩を遂げているが、狭視野(FoV)と360^\circ$パノラマの両視点で一般化された距離深度推定は未解決の課題である。既存の方法は、特定のカメラタイプに合わせて調整され、様々な設定にまたがって一般化する正確な計量深度を生成するのに苦労することが多い。この制限は、パースペクティブとパノラマカメラの固有の幾何学的相違と、メートル法アノテーションによるパノラマトレーニングデータの不足という2つの主要な課題に起因している。本稿では,統合されたメートル法深度推定フレームワークであるDepthMasterを紹介する。球面歪みを学習するために専門的なネットワークを利用するのではなく、パノラマ画像を重なり合う視点パッチに分解することで問題を再構築する。重要なことは、境界を扱うためのアドホックなアーキテクチャ修正に依存する従来のプロジェクションベースの手法とは違い、新しい対応一貫性損失(CCL)を導入し、仮想プロジェクションカメラを幾何学的先行として注入することで、特殊なオペレータを避けながらパッチをシームレスに縫合し、バックボーンを標準のTransformer設計とほぼ互換性のあるものにすることができる。この戦略はまた、全ての入力を標準的視点表現に統一することで幾何学的差異を解消し、巨大な視点データセットから直接強力なメトリック先行をアンロックすることで、データ不足を効果的に回避する。 1つのパノラマデータセットのみを含む混合データセットに基づいてトレーニングされたDepthMasterは、13の多様なデータセットで最先端のゼロショットパフォーマンスを達成し、ユニバーサルメソッドだけでなく、視点とパノラマドメインの両方において、主要なスペシャリストモデルよりも優れています。

論文の概要: DepthMaster: Unified Monocular Depth Estimation for Perspective and Panoramic Images

関連論文リスト