Consistent Video Depth Estimation
- URL: http://arxiv.org/abs/2004.15021v2
- Date: Wed, 26 Aug 2020 20:11:33 GMT
- Title: Consistent Video Depth Estimation
- Authors: Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, Johannes Kopf
- Abstract summary: We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video.
We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video.
Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
- Score: 57.712779457632024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an algorithm for reconstructing dense, geometrically consistent
depth for all pixels in a monocular video. We leverage a conventional
structure-from-motion reconstruction to establish geometric constraints on
pixels in the video. Unlike the ad-hoc priors in classical reconstruction, we
use a learning-based prior, i.e., a convolutional neural network trained for
single-image depth estimation. At test time, we fine-tune this network to
satisfy the geometric constraints of a particular input video, while retaining
its ability to synthesize plausible depth details in parts of the video that
are less constrained. We show through quantitative validation that our method
achieves higher accuracy and a higher degree of geometric consistency than
previous monocular reconstruction methods. Visually, our results appear more
stable. Our algorithm is able to handle challenging hand-held captured input
videos with a moderate degree of dynamic motion. The improved quality of the
reconstruction enables several applications, such as scene reconstruction and
advanced video-based visual effects.
Related papers
- DoubleTake: Geometry Guided Depth Estimation [17.464549832122714]
Estimating depth from a sequence of posed RGB images is a fundamental computer vision task.
We introduce a reconstruction which combines volume features with a hint of the prior geometry, rendered as a depth map from the current camera location.
We demonstrate that our method can run at interactive speeds, state-of-the-art estimates of depth and 3D scene in both offline and incremental evaluation scenarios.
arXiv Detail & Related papers (2024-06-26T14:29:05Z) - Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation.
We reformulate the prediction task into a conditional generation problem.
This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z) - AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation.
This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z) - Edge-aware Consistent Stereo Video Depth Estimation [3.611754783778107]
We propose a consistent method for dense video depth estimation.
Unlike the existing monocular methods, ours relates to stereo videos.
We show that our edge-aware stereo video model can accurately estimate the dense depth maps.
arXiv Detail & Related papers (2023-05-04T08:30:04Z) - Accurate Human Body Reconstruction for Volumetric Video [0.9134661726886928]
We introduce and optimize deep learning-based multi-view stereo networks for depth map estimation in the context of professional volumetric video reconstruction.
We show that our method can generate high levels of geometric detail for reconstructed human bodies.
arXiv Detail & Related papers (2022-02-26T11:37:08Z) - Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular
Video Depth [90.33296913575818]
In some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
We propose a locally weighted linear regression method to recover the scale and shift with very sparse anchor points.
Our method can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
arXiv Detail & Related papers (2022-02-03T08:52:54Z) - DF-VO: What Should Be Learnt for Visual Odometry? [33.379888882093965]
We design a simple yet robust Visual Odometry system by integrating multi-view geometry and deep learning on Depth and optical Flow.
Comprehensive ablation studies show the effectiveness of the proposed method, and extensive evaluation results show the state-of-the-art performance of our system.
arXiv Detail & Related papers (2021-03-01T11:50:39Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Learning non-rigid surface reconstruction from spatio-temporal image
patches [0.0]
We present a method to reconstruct a dense-temporal depth map of a deformable object from a video sequence.
The estimation of depth is performed locally on non-temporal patches of the video, and the full depth video of entire shape is recovered by combining them together.
We tested our method on both synthetic and Kinect data and experimentally observed that the reconstruction error is significantly lower than the one obtained using other approaches like conventional non-rigid structure.
arXiv Detail & Related papers (2020-06-18T20:25:15Z) - Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images.
This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting.
We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.