Fugu-MT 論文翻訳(概要): DA$^2$: Depth Anything in Any Direction

論文の概要: DA$^2$: Depth Anything in Any Direction

arxiv url: http://arxiv.org/abs/2509.26618v1
Date: Tue, 30 Sep 2025 17:55:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:45:00.239609
Title: DA$^2$: Depth Anything in Any Direction
Title（参考訳）: DA$^2$:あらゆる方向の深さ
Authors: Haodong Li, Wangguangdong Zheng, Jing He, Yuhao Liu, Xin Lin, Xin Yang, Ying-Cong Chen, Chunchao Guo,
Abstract要約: パノラマにはフルのFoV(360$circtimes$180$circ$)があり、視点画像よりも完全な視覚的記述を提供する。従来のメソッドはドメイン内の設定に制限されることが多いため、ゼロショットの一般化は不十分である。 DA$2$: $textbfD$epth $textbfA$nything in $textbfA$ny $textbfD$irection。
参考スコア（独自算出の注目度）: 36.52106383466286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Panorama has a full FoV (360$^\circ\times$180$^\circ$), offering a more complete visual description than perspective images. Thanks to this characteristic, panoramic depth estimation is gaining increasing traction in 3D vision. However, due to the scarcity of panoramic data, previous methods are often restricted to in-domain settings, leading to poor zero-shot generalization. Furthermore, due to the spherical distortions inherent in panoramas, many approaches rely on perspective splitting (e.g., cubemaps), which leads to suboptimal efficiency. To address these challenges, we propose $\textbf{DA}$$^{\textbf{2}}$: $\textbf{D}$epth $\textbf{A}$nything in $\textbf{A}$ny $\textbf{D}$irection, an accurate, zero-shot generalizable, and fully end-to-end panoramic depth estimator. Specifically, for scaling up panoramic data, we introduce a data curation engine for generating high-quality panoramic depth data from perspective, and create $\sim$543K panoramic RGB-depth pairs, bringing the total to $\sim$607K. To further mitigate the spherical distortions, we present SphereViT, which explicitly leverages spherical coordinates to enforce the spherical geometric consistency in panoramic image features, yielding improved performance. A comprehensive benchmark on multiple datasets clearly demonstrates DA$^{2}$'s SoTA performance, with an average 38% improvement on AbsRel over the strongest zero-shot baseline. Surprisingly, DA$^{2}$ even outperforms prior in-domain methods, highlighting its superior zero-shot generalization. Moreover, as an end-to-end solution, DA$^{2}$ exhibits much higher efficiency over fusion-based approaches. Both the code and the curated panoramic data will be released. Project page: https://depth-any-in-any-dir.github.io/.
Abstract（参考訳）: パノラマにはフルのFoV(360$^\circ\times$180$^\circ$)があり、視点画像よりも完全な視覚的記述を提供する。この特徴により、3次元視覚においてパノラマ深度推定が牽引力を高めつつある。しかし、パノラマデータが少ないため、従来の手法はドメイン内の設定に制限されることが多く、ゼロショットの一般化は不十分である。さらに、パノラマに固有の球面歪みのため、多くのアプローチはパースペクティブスプリッティング(例えば立方体写像)に依存しており、最適以下の効率をもたらす。これらの課題に取り組むために、$\textbf{DA}$$$^{\textbf{2}}$: $\textbf{D}$epth $\textbf{A}$nything in $\textbf{A}$ny $\textbf{D}$irection, a accurate, zero-shot generalizable and complete end-to-end panoramic depth estimatorを提案する。具体的には、パノラマデータをスケールアップするために、高画質のパノラマ深度データを生成するデータキュレーションエンジンを導入し、パノラマRGB深度ペアを$543Kで作成し、合計で$607Kとなる。球面歪みを緩和するために、球面座標を明示的に活用してパノラマ画像の特徴の球面幾何整合性を強制し、性能を向上するSphereViTを提案する。複数のデータセットに関する包括的なベンチマークでは、DA$^{2}$'s SoTAのパフォーマンスが明らかに示されており、AbsRelを最強のゼロショットベースラインよりも平均38%改善している。驚いたことに、DA$^{2}$はドメイン内メソッドよりも優れており、その優れたゼロショット一般化が強調されている。さらに、エンドツーエンドのソリューションとして、DA$^{2}$は融合ベースのアプローチよりもはるかに高い効率を示す。コードとキュレートされたパノラマデータの両方がリリースされる。プロジェクトページ:https://depth-any-in-any-any-dir.github.io/。

論文の概要: DA$^2$: Depth Anything in Any Direction

関連論文リスト