Consistent Video Depth Estimation
- URL: http://arxiv.org/abs/2004.15021v2
- Date: Wed, 26 Aug 2020 20:11:33 GMT
- Title: Consistent Video Depth Estimation
- Authors: Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, Johannes Kopf
- Abstract summary: We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video.
We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video.
Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
- Score: 57.712779457632024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an algorithm for reconstructing dense, geometrically consistent
depth for all pixels in a monocular video. We leverage a conventional
structure-from-motion reconstruction to establish geometric constraints on
pixels in the video. Unlike the ad-hoc priors in classical reconstruction, we
use a learning-based prior, i.e., a convolutional neural network trained for
single-image depth estimation. At test time, we fine-tune this network to
satisfy the geometric constraints of a particular input video, while retaining
its ability to synthesize plausible depth details in parts of the video that
are less constrained. We show through quantitative validation that our method
achieves higher accuracy and a higher degree of geometric consistency than
previous monocular reconstruction methods. Visually, our results appear more
stable. Our algorithm is able to handle challenging hand-held captured input
videos with a moderate degree of dynamic motion. The improved quality of the
reconstruction enables several applications, such as scene reconstruction and
advanced video-based visual effects.
Related papers
- Video Depth Anything: Consistent Depth Estimation for Super-Long Videos [60.857723250653976]
We propose Video Depth Anything for high-quality, consistent depth estimation in super-long videos.
Our model is trained on a joint dataset of video depth and unlabeled images, similar to Depth Anything V2.
Our approach sets a new state-of-the-art in zero-shot video depth estimation.
arXiv Detail & Related papers (2025-01-21T18:53:30Z) - Align3R: Aligned Monocular Depth Estimation for Dynamic Videos [50.28715151619659]
We propose a novel video-depth estimation method called Align3R to estimate temporal consistent depth maps for a dynamic video.
Our key idea is to utilize the recent DUSt3R model to align estimated monocular depth maps of different timesteps.
Experiments demonstrate that Align3R estimates consistent video depth and camera poses for a monocular video with superior performance than baseline methods.
arXiv Detail & Related papers (2024-12-04T07:09:59Z) - DoubleTake: Geometry Guided Depth Estimation [17.464549832122714]
Estimating depth from a sequence of posed RGB images is a fundamental computer vision task.
We introduce a reconstruction which combines volume features with a hint of the prior geometry, rendered as a depth map from the current camera location.
We demonstrate that our method can run at interactive speeds, state-of-the-art estimates of depth and 3D scene in both offline and incremental evaluation scenarios.
arXiv Detail & Related papers (2024-06-26T14:29:05Z) - Edge-aware Consistent Stereo Video Depth Estimation [3.611754783778107]
We propose a consistent method for dense video depth estimation.
Unlike the existing monocular methods, ours relates to stereo videos.
We show that our edge-aware stereo video model can accurately estimate the dense depth maps.
arXiv Detail & Related papers (2023-05-04T08:30:04Z) - Accurate Human Body Reconstruction for Volumetric Video [0.9134661726886928]
We introduce and optimize deep learning-based multi-view stereo networks for depth map estimation in the context of professional volumetric video reconstruction.
We show that our method can generate high levels of geometric detail for reconstructed human bodies.
arXiv Detail & Related papers (2022-02-26T11:37:08Z) - Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular
Video Depth [90.33296913575818]
In some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
We propose a locally weighted linear regression method to recover the scale and shift with very sparse anchor points.
Our method can boost the performance of existing state-of-the-art approaches by 50% at most over several zero-shot benchmarks.
arXiv Detail & Related papers (2022-02-03T08:52:54Z) - DF-VO: What Should Be Learnt for Visual Odometry? [33.379888882093965]
We design a simple yet robust Visual Odometry system by integrating multi-view geometry and deep learning on Depth and optical Flow.
Comprehensive ablation studies show the effectiveness of the proposed method, and extensive evaluation results show the state-of-the-art performance of our system.
arXiv Detail & Related papers (2021-03-01T11:50:39Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images.
This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting.
We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.