Fugu-MT 論文翻訳(概要): SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

論文の概要: SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

arxiv url: http://arxiv.org/abs/2309.00526v1
Date: Fri, 1 Sep 2023 15:27:45 GMT
ステータス: 翻訳完了
システム内更新日: 2023-09-04 12:59:39.939383
Title: SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation
Title（参考訳）: sqldepth: 一般化可能な自己教師付き微細構造単眼深度推定
Authors: Youhong Wang, Yunji Liang, Hao Xu, Shaohui Jiao, Hongkai Yu
Abstract要約: 自律走行とロボット工学における多くの応用において、自己監督された単眼深度推定が人気を集めている。既存のソリューションは主に、直近の視覚的特徴から深度を推定し、限定的な一般化できめ細かなシーンの詳細の復元に苦慮している。本稿では,動きから微粒なシーン構造を効果的に学習できる新しい手法を提案する。
参考スコア（独自算出の注目度）: 11.661761367241041
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, self-supervised monocular depth estimation has gained popularity with numerous applications in autonomous driving and robotics. However, existing solutions primarily seek to estimate depth from immediate visual features, and struggle to recover fine-grained scene details with limited generalization. In this paper, we introduce SQLdepth, a novel approach that can effectively learn fine-grained scene structures from motion. In SQLdepth, we propose a novel Self Query Layer (SQL) to build a self-cost volume and infer depth from it, rather than inferring depth from feature maps. The self-cost volume implicitly captures the intrinsic geometry of the scene within a single frame. Each individual slice of the volume signifies the relative distances between points and objects within a latent space. Ultimately, this volume is compressed to the depth map via a novel decoding approach. Experimental results on KITTI and Cityscapes show that our method attains remarkable state-of-the-art performance (AbsRel = $0.082$ on KITTI, $0.052$ on KITTI with improved ground-truth and $0.106$ on Cityscapes), achieves $9.9\%$, $5.5\%$ and $4.5\%$ error reduction from the previous best. In addition, our approach showcases reduced training complexity, computational efficiency, improved generalization, and the ability to recover fine-grained scene details. Moreover, the self-supervised pre-trained and metric fine-tuned SQLdepth can surpass existing supervised methods by significant margins (AbsRel = $0.043$, $14\%$ error reduction). self-matching-oriented relative distance querying in SQL improves the robustness and zero-shot generalization capability of SQLdepth. Code and the pre-trained weights will be publicly available. Code is available at \href{https://github.com/hisfog/SQLdepth-Impl}{https://github.com/hisfog/SQLdepth-Impl}.
Abstract（参考訳）: 近年,自律運転やロボット工学において,自己教師付き単眼深度推定が盛んに行われている。しかし、既存のソリューションは、視覚的特徴から深度を推定し、より詳細なシーンの詳細を限定的な一般化で再現するのに苦慮している。本稿では,動きから微粒なシーン構造を効果的に学習できる新しい手法であるSQLdepthを紹介する。 SQLdepthでは、機能マップから深度を推定するのではなく、自己コストのボリュームを構築し、そこから深度を推定する新しいセルフクエリー層(SQL)を提案する。自費ボリュームは、1つのフレーム内のシーンの固有の幾何学を暗黙的に捉えます。体積の個々のスライスは、相対空間内の点と物体の間の相対距離を表す。最終的に、この体積は新しい復号法によって深さマップに圧縮される。 KITTIとCityscapesの実験結果から,本手法は,KITTIでは0.082$,KITTIでは0.052$,Cityscapesでは0.106$,9.9\%,5.5\%,4.5\%の誤差低減を実現していることがわかった。さらに,学習複雑性の低減,計算効率の向上,一般化の向上,細粒度シーン詳細の復元機能を示す。さらに、自己教師付き事前学習とメトリック微調整sqldepthは、既存の教師付きメソッドをかなりのマージンで越えることができる(absrel = $0.043$, $114\%$ error reduction)。 SQLにおける自己マッチング指向の相対的距離クエリは、SQLdepthの堅牢性とゼロショットの一般化能力を改善する。コードとトレーニング済みのウェイトは公開されます。コードは \href{https://github.com/hisfog/sqldepth-impl}{https://github.com/hisfog/sqldepth-impl} で入手できる。

論文の概要: SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation

関連論文リスト