Crafting Monocular Cues and Velocity Guidance for Self-Supervised
Multi-Frame Depth Learning
- URL: http://arxiv.org/abs/2208.09170v1
- Date: Fri, 19 Aug 2022 06:32:06 GMT
- Title: Crafting Monocular Cues and Velocity Guidance for Self-Supervised
Multi-Frame Depth Learning
- Authors: Xiaofeng Wang and Zheng Zhu and Guan Huang and Xu Chi and Yun Ye and
Ziwei Chen and Xingang Wang
- Abstract summary: Self-supervised monocular methods can efficiently learn depth information of weakly textured surfaces or reflective objects.
In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo.
We propose MOVEDepth, which exploits the MOnocular cues and VE guidance to improve multi-frame depth learning.
- Score: 22.828829870704006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised monocular methods can efficiently learn depth information of
weakly textured surfaces or reflective objects. However, the depth accuracy is
limited due to the inherent ambiguity in monocular geometric modeling. In
contrast, multi-frame depth estimation methods improve the depth accuracy
thanks to the success of Multi-View Stereo (MVS), which directly makes use of
geometric constraints. Unfortunately, MVS often suffers from texture-less
regions, non-Lambertian surfaces, and moving objects, especially in real-world
video sequences without known camera motion and depth supervision. Therefore,
we propose MOVEDepth, which exploits the MOnocular cues and VElocity guidance
to improve multi-frame Depth learning. Unlike existing methods that enforce
consistency between MVS depth and monocular depth, MOVEDepth boosts multi-frame
depth learning by directly addressing the inherent problems of MVS. The key of
our approach is to utilize monocular depth as a geometric priority to construct
MVS cost volume, and adjust depth candidates of cost volume under the guidance
of predicted camera velocity. We further fuse monocular depth and MVS depth by
learning uncertainty in the cost volume, which results in a robust depth
estimation against ambiguity in multi-view geometry. Extensive experiments show
MOVEDepth achieves state-of-the-art performance: Compared with Monodepth2 and
PackNet, our method relatively improves the depth accuracy by 20\% and 19.8\%
on the KITTI benchmark. MOVEDepth also generalizes to the more challenging DDAD
benchmark, relatively outperforming ManyDepth by 7.2\%. The code is available
at https://github.com/JeffWang987/MOVEDepth.
Related papers
- Manydepth2: Motion-Aware Self-Supervised Multi-Frame Monocular Depth Estimation in Dynamic Scenes [45.092076587934464]
We present Manydepth2, to achieve precise depth estimation for both dynamic objects and static backgrounds.
To tackle the challenges posed by dynamic content, we incorporate optical flow and coarse monocular depth to create a pseudo-static reference frame.
This frame is then utilized to build a motion-aware cost volume in collaboration with the vanilla target frame.
arXiv Detail & Related papers (2023-12-23T14:36:27Z) - Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth
Approach with Saddle-shaped Depth Cells [23.345139129458122]
We show that different depth geometries have significant performance gaps, even using the same depth prediction error.
We introduce an ideal depth geometry composed of Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward around the ground-truth surface.
Our method also points to a new research direction for considering depth geometry in MVS.
arXiv Detail & Related papers (2023-07-18T11:37:53Z) - Monocular Visual-Inertial Depth Estimation [66.71452943981558]
We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry.
Our approach performs global scale and shift alignment against sparse metric depth, followed by learning-based dense alignment.
We evaluate on the TartanAir and VOID datasets, observing up to 30% reduction in RMSE with dense scale alignment.
arXiv Detail & Related papers (2023-03-21T18:47:34Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Uncertainty Guided Depth Fusion for Spike Camera [49.41822923588663]
We propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse predictions of monocular and stereo depth estimation networks for spike camera.
Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range.
In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K.
arXiv Detail & Related papers (2022-08-26T13:04:01Z) - Improving Monocular Visual Odometry Using Learned Depth [84.05081552443693]
We propose a framework to exploit monocular depth estimation for improving visual odometry (VO)
The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes.
Compared with current learning-based VO methods, our method demonstrates a stronger generalization ability to diverse scenes.
arXiv Detail & Related papers (2022-04-04T06:26:46Z) - Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised
Monocular Depth Estimation [11.929584800629673]
We propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation.
The proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual.
arXiv Detail & Related papers (2022-03-21T12:43:42Z) - DDL-MVS: Depth Discontinuity Learning for MVS Networks [0.5735035463793007]
We propose depth discontinuity learning for MVS methods, which further improves accuracy while retaining the completeness of the reconstruction.
We validate our idea and demonstrate that our strategies can be easily integrated into the existing learning-based MVS pipeline.
arXiv Detail & Related papers (2022-03-02T20:25:31Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Video Depth Estimation by Fusing Flow-to-Depth Proposals [65.24533384679657]
We present an approach with a differentiable flow-to-depth layer for video depth estimation.
The model consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network.
Our approach outperforms state-of-the-art depth estimation methods, and has reasonable cross dataset generalization capability.
arXiv Detail & Related papers (2019-12-30T10:45:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.