Fugu-MT 論文翻訳(概要): FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene

論文の概要: FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene

arxiv url: http://arxiv.org/abs/2307.14624v1
Date: Thu, 27 Jul 2023 04:49:36 GMT
ステータス: 翻訳完了
システム内更新日: 2023-07-28 15:51:16.130362
Title: FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene
Title（参考訳）: FS-Depth:未確認屋内シーンにおける単一画像からの焦点・スケール深度推定
Authors: Chengrui Wei, Meng Yang, Lei He, Nanning Zheng
Abstract要約: 実際の(見えない)屋内シーンの単一の画像から絶対深度マップを予測するのには、長年不適切な問題だった。本研究では,未確認屋内シーンの単一画像から絶対深度マップを正確に学習するための焦点・スケール深度推定モデルを開発した。
参考スコア（独自算出の注目度）: 57.26600120397529
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It has long been an ill-posed problem to predict absolute depth maps from single images in real (unseen) indoor scenes. We observe that it is essentially due to not only the scale-ambiguous problem but also the focal-ambiguous problem that decreases the generalization ability of monocular depth estimation. That is, images may be captured by cameras of different focal lengths in scenes of different scales. In this paper, we develop a focal-and-scale depth estimation model to well learn absolute depth maps from single images in unseen indoor scenes. First, a relative depth estimation network is adopted to learn relative depths from single images with diverse scales/semantics. Second, multi-scale features are generated by mapping a single focal length value to focal length features and concatenating them with intermediate features of different scales in relative depth estimation. Finally, relative depths and multi-scale features are jointly fed into an absolute depth estimation network. In addition, a new pipeline is developed to augment the diversity of focal lengths of public datasets, which are often captured with cameras of the same or similar focal lengths. Our model is trained on augmented NYUDv2 and tested on three unseen datasets. Our model considerably improves the generalization ability of depth estimation by 41%/13% (RMSE) with/without data augmentation compared with five recent SOTAs and well alleviates the deformation problem in 3D reconstruction. Notably, our model well maintains the accuracy of depth estimation on original NYUDv2.
Abstract（参考訳）: 現実の(見当たらない)屋内シーンで単一の画像から絶対深度マップを予測するのは、長い間不適切な問題だった。単眼深度推定の一般化能力の低下は, スケール・あいまいな問題だけでなく, 焦点・あいまいな問題も本質的に原因であると考えられる。つまり、異なるスケールのシーンで焦点距離の異なるカメラによって撮影される可能性がある。本稿では,未確認屋内シーンの単一画像から絶対深度マップを正確に学習するための焦点・スケール深度推定モデルを開発する。まず,多様なスケール/セマンティクスを持つ単一画像から相対深度を学習するために,相対深度推定ネットワークを採用する。第2に、単一の焦点長値を焦点長特徴にマッピングし、異なるスケールの中間特徴と相対深度推定を連結することにより、マルチスケール特徴を生成する。最後に、相対深度とマルチスケール特徴を共同で絶対深度推定ネットワークに供給する。さらに、同じまたは同様の焦点長のカメラで撮影される公共データセットの焦点長の多様性を強化するために、新しいパイプラインが開発されている。私たちのモデルは拡張nyudv2でトレーニングされ、見えない3つのデータセットでテストされます。我々のモデルでは,最近の5つのSOTAと比較して,データ拡張による深度推定の一般化能力を41%/13%向上させ,3次元再構成における変形問題を緩和する。特に,本モデルでは,元のNYUDv2の深度推定の精度を良好に維持する。

論文の概要: FS-Depth: Focal-and-Scale Depth Estimation from a Single Image in Unseen Indoor Scene

関連論文リスト