Video Depth Estimation by Fusing Flow-to-Depth Proposals
- URL: http://arxiv.org/abs/1912.12874v2
- Date: Tue, 3 Mar 2020 06:34:04 GMT
- Title: Video Depth Estimation by Fusing Flow-to-Depth Proposals
- Authors: Jiaxin Xie, Chenyang Lei, Zhuwen Li, Li Erran Li and Qifeng Chen
- Abstract summary: We present an approach with a differentiable flow-to-depth layer for video depth estimation.
The model consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network.
Our approach outperforms state-of-the-art depth estimation methods, and has reasonable cross dataset generalization capability.
- Score: 65.24533384679657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth from a monocular video can enable billions of devices and robots with a
single camera to see the world in 3D. In this paper, we present an approach
with a differentiable flow-to-depth layer for video depth estimation. The model
consists of a flow-to-depth layer, a camera pose refinement module, and a depth
fusion network. Given optical flow and camera pose, our flow-to-depth layer
generates depth proposals and the corresponding confidence maps by explicitly
solving an epipolar geometry optimization problem. Our flow-to-depth layer is
differentiable, and thus we can refine camera poses by maximizing the
aggregated confidence in the camera pose refinement module. Our depth fusion
network can utilize depth proposals and their confidence maps inferred from
different adjacent frames to produce the final depth map. Furthermore, the
depth fusion network can additionally take the depth proposals generated by
other methods to improve the results further. The experiments on three public
datasets show that our approach outperforms state-of-the-art depth estimation
methods, and has reasonable cross dataset generalization capability: our model
trained on KITTI still performs well on the unseen Waymo dataset.
Related papers
- GEDepth: Ground Embedding for Monocular Depth Estimation [4.95394574147086]
This paper proposes a novel ground embedding module to decouple camera parameters from pictorial cues.
A ground attention is designed in the module to optimally combine ground depth with residual depth.
Experiments reveal that our approach achieves the state-of-the-art results on popular benchmarks.
arXiv Detail & Related papers (2023-09-18T17:56:06Z) - Self-Supervised Learning based Depth Estimation from Monocular Images [0.0]
The goal of Monocular Depth Estimation is to predict the depth map, given a 2D monocular RGB image as input.
We plan to do intrinsic camera parameters during training and apply weather augmentations to further generalize our model.
arXiv Detail & Related papers (2023-04-14T07:14:08Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Deep Two-View Structure-from-Motion Revisited [83.93809929963969]
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
We propose to revisit the problem of deep two-view SfM by leveraging the well-posedness of the classic pipeline.
Our method consists of 1) an optical flow estimation network that predicts dense correspondences between two frames; 2) a normalized pose estimation module that computes relative camera poses from the 2D optical flow correspondences, and 3) a scale-invariant depth estimation network that leverages epipolar geometry to reduce the search space, refine the dense correspondences, and estimate relative depth maps.
arXiv Detail & Related papers (2021-04-01T15:31:20Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Self-Attention Dense Depth Estimation Network for Unrectified Video
Sequences [6.821598757786515]
LiDAR and radar sensors are the hardware solution for real-time depth estimation.
Deep learning based self-supervised depth estimation methods have shown promising results.
We propose a self-attention based depth and ego-motion network for unrectified images.
arXiv Detail & Related papers (2020-05-28T21:53:53Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.