Fugu-MT 論文翻訳(概要): SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

論文の概要: SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

arxiv url: http://arxiv.org/abs/2403.08556v1
Date: Wed, 13 Mar 2024 14:08:25 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-14 14:11:05.943799
Title: SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model
Title（参考訳）: SM4Depth:シームレス単分子距離推定ワンモデルによるカメラとシーン
Authors: Yihao Liu and Feng Xue and Anlong Ming
Abstract要約: 本稿では, SM4Depthを提案する。SM4Depthは, 1つのネットワーク内の全ての問題にシームレスに対処するMMDE手法である。まず、一貫した視野(FOV)が、カメラ間の距離あいまいさを解決する鍵であることを明らかにする。第2に,シーン間で一貫した精度を達成するために,距離尺度の決定を,深さ間隔をビンに識別するものとして明示的にモデル化する。第三に、大規模なトレーニングデータへの依存を減らすために、我々は「分割と征服」のソリューションを提案する。
参考スコア（独自算出の注目度）: 23.95095404136943
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The generalization of monocular metric depth estimation (MMDE) has been a longstanding challenge. Recent methods made progress by combining relative and metric depth or aligning input image focal length. However, they are still beset by challenges in camera, scene, and data levels: (1) Sensitivity to different cameras; (2) Inconsistent accuracy across scenes; (3) Reliance on massive training data. This paper proposes SM4Depth, a seamless MMDE method, to address all the issues above within a single network. First, we reveal that a consistent field of view (FOV) is the key to resolve ``metric ambiguity'' across cameras, which guides us to propose a more straightforward preprocessing unit. Second, to achieve consistently high accuracy across scenes, we explicitly model the metric scale determination as discretizing the depth interval into bins and propose variation-based unnormalized depth bins. This method bridges the depth gap of diverse scenes by reducing the ambiguity of the conventional metric bin. Third, to reduce the reliance on massive training data, we propose a ``divide and conquer" solution. Instead of estimating directly from the vast solution space, the correct metric bins are estimated from multiple solution sub-spaces for complexity reduction. Finally, with just 150K RGB-D pairs and a consumer-grade GPU for training, SM4Depth achieves state-of-the-art performance on most previously unseen datasets, especially surpassing ZoeDepth and Metric3D on mRI$_\theta$. The code can be found at https://github.com/1hao-Liu/SM4Depth.
Abstract（参考訳）: 単分子距離深さ推定(MMDE)の一般化は長年にわたる課題である。近年の手法では、相対深度とメートル法深度を組み合わせたり、入力画像焦点距離を調整したりして進行している。しかし,カメラ,シーン,データレベルでの課題は,(1)異なるカメラに対する感度,(2)シーン間の不整合精度,(3)大規模なトレーニングデータへの信頼,等々である。本稿では,1つのネットワーク内の全ての問題に対処する,シームレスなMMDE手法であるSM4Depthを提案する。まず、一貫した視野(FOV)が、カメラ間の「測度あいまいさ」を解決する鍵であることを明らかにする。第2に,シーン間で連続的に高い精度を達成するため,距離尺度の決定を,深さ間隔をビンに識別し,変分に基づく非正規化深度ビンを提案する。この方法は従来の計量ビンのあいまいさを減らして多様なシーンの深さギャップを橋渡しする。第三に、大規模なトレーニングデータへの依存を減らすために、我々は ‘divide and conquer’ ソリューションを提案する。広大な解空間から直接推定する代わりに、正しい計量ビンは複雑性の減少のために複数の解部分空間から推定される。最後に、たった150KのRGB-Dペアとトレーニング用のコンシューマグレードのGPUで、SM4Depthは、これまで見たことのないほとんどのデータセット、特にmRI$_\theta$のZoeDepthとMetric3Dを上回る最先端のパフォーマンスを実現している。コードはhttps://github.com/1hao-Liu/SM4Depthで見ることができる。

論文の概要: SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

関連論文リスト