Fugu-MT 論文翻訳(概要): Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation

論文の概要: Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation

arxiv url: http://arxiv.org/abs/2604.07665v1
Date: Thu, 09 Apr 2026 00:14:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.60783
Title: Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation
Title（参考訳）: 自己教師型単眼深度推定のための適応深さ変換スケール畳み込み
Authors: Yanbo Gao, Huibin Bai, Huasong Zhou, Xingyu Gao, Shuai Li, Xun Cai, Hui Yuan, Wei Hua, Tian Xie,
Abstract要約: 本稿では,DcSConv(Depth-converted-Scale Convolution)を改良した単眼深度推定フレームワークを提案する。提案したDcSConvは、形状の局所的な変形ではなく、畳み込みフィルタの適応スケールに焦点を当てている。 Depth-converted-Scale aware Fusion (DcS-F) はDcSConv機能と従来の畳み込み機能とを適応的に融合させる。
参考スコア（独自算出の注目度）: 23.909506883639466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-supervised monocular depth estimation (MDE) has received increasing interests in the last few years. The objects in the scene, including the object size and relationship among different objects, are the main clues to extract the scene structure. However, previous works lack the explicit handling of the changing sizes of the object due to the change of its depth. Especially in a monocular video, the size of the same object is continuously changed, resulting in size and depth ambiguity. To address this problem, we propose a Depth-converted-Scale Convolution (DcSConv) enhanced monocular depth estimation framework, by incorporating the prior relationship between the object depth and object scale to extract features from appropriate scales of the convolution receptive field. The proposed DcSConv focuses on the adaptive scale of the convolution filter instead of the local deformation of its shape. It establishes that the scale of the convolution filter matters no less (or even more in the evaluated task) than its local deformation. Moreover, a Depth-converted-Scale aware Fusion (DcS-F) is developed to adaptively fuse the DcSConv features and the conventional convolution features. Our DcSConv enhanced monocular depth estimation framework can be applied on top of existing CNN based methods as a plug-and-play module to enhance the conventional convolution block. Extensive experiments with different baselines have been conducted on the KITTI benchmark and our method achieves the best results with an improvement up to 11.6% in terms of SqRel reduction. Ablation study also validates the effectiveness of each proposed module.
Abstract（参考訳）: 自己監督型単分子深度推定(MDE)はここ数年,関心が高まっている。シーン内のオブジェクトは、オブジェクトのサイズや異なるオブジェクト間の関係を含むもので、シーン構造を抽出するための主要な手がかりである。しかし、以前の研究では、その深さの変化によってオブジェクトのサイズが変化するという明示的な扱いが欠けていた。特にモノクロビデオでは、同じ物体のサイズが連続的に変化し、サイズと深さのあいまいさが生じる。この問題に対処するために,物体深度と物体スケールの関係を組み込んだDcSConv(Depth-converted-Scale Convolution)拡張単眼深度推定フレームワークを提案する。提案したDcSConvは、形状の局所的な変形ではなく、畳み込みフィルタの適応スケールに焦点を当てている。畳み込みフィルタのスケールが局所的な変形よりも小さい(あるいは、評価されたタスクでさらに大きい)ことを証明している。さらに、DcSConv機能と従来の畳み込み機能とを適応的に融合させるために、DcS-F(Depth-converted-Scale aware Fusion)を開発した。我々のDcSConv拡張単分子深度推定フレームワークは、従来の畳み込みブロックを強化するためのプラグイン・アンド・プレイモジュールとして、既存のCNNベースのメソッドの上に適用することができる。 KITTIのベンチマークでは,異なるベースラインによる大規模な実験が実施されており,SqRelの削減率を最大11.6%向上した結果が得られた。アブレーション研究は、提案された各モジュールの有効性も検証する。

論文の概要: Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation

関連論文リスト