Fugu-MT 論文翻訳(概要): Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

論文の概要: Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

arxiv url: http://arxiv.org/abs/2404.15506v3
Date: Tue, 29 Oct 2024 06:24:27 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-28 17:07:32.038325
Title: Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Title（参考訳）: Metric3Dv2:Zero-shot Metric Depthと表面正規化のための垂直な単色幾何学基礎モデル
Authors: Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen,
Abstract要約: Metric3D v2は、ゼロショット距離深さと1枚の画像からの表面正規推定のための幾何学的基礎モデルである。距離深度推定と表面正規度推定の両方の解を提案する。本手法は, ランダムに収集したインターネット画像上での計測3次元構造の正確な復元を可能にする。
参考スコア（独自算出の注目度）: 74.28509379811084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods have limited zero-shot performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions for both metric depth estimation and surface normal estimation. For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problem and can be effortlessly plugged into existing monocular models. For surface normal estimation, we propose a joint depth-normal optimization module to distill diverse data knowledge from metric depth, enabling normal estimators to learn beyond normal labels. Equipped with these modules, our depth-normal models can be stably trained with over 16 million of images from thousands of camera models with different-type annotations, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. Our project page is at https://JUGGHM.github.io/Metric3Dv2.
Abstract（参考訳）: ゼロショット距離深度と1枚の画像からの表面正規推定のための幾何学的基礎モデルであるMetric3D v2を導入する。深さと正規度は幾何学的に関連し、高度に補完的であるが、それらは異なる課題を提示する。 SoTA単分子深度法は、実世界のメトリクスを回復できないアフィン不変深度を学習することでゼロショットの一般化を実現する。一方、SoTA正規推定法は、大規模ラベル付きデータの欠如により、ゼロショット性能が制限されている。これらの問題に対処するために,計量深度推定と表面正規度推定の両方の解を提案する。距離深度推定において、ゼロショット単一ビューモデルの鍵となるのは、様々なカメラモデルからの距離のあいまいさを解消し、大規模データトレーニングを行うことである。本稿では,あいまいさ問題に明示的に対処し,既存の単分子モデルにシームレスに接続可能な標準カメラ空間変換モジュールを提案する。表面の正規度推定には, 様々なデータ知識を計量深度から抽出し, 正規度推定器が通常のラベルを超えて学習できるようにする, 共同深度正規度最適化モジュールを提案する。これらのモジュールを組み込んだ私たちのディープノーマルモデルは、異なるタイプのアノテーションを持つ何千ものカメラモデルから1600万枚以上の画像を安定的にトレーニングすることができる。提案手法は, ランダムに収集したインターネット画像上での計測3次元構造の正確な復元を可能にする。私たちのプロジェクトページはhttps://JUGGHM.github.io/Metric3Dv2.comです。

論文の概要: Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

関連論文リスト