Neural Video Depth Stabilizer
- URL: http://arxiv.org/abs/2307.08695v2
- Date: Thu, 10 Aug 2023 09:36:06 GMT
- Title: Neural Video Depth Stabilizer
- Authors: Yiran Wang, Min Shi, Jiaqi Li, Zihao Huang, Zhiguo Cao, Jianming
Zhang, Ke Xian, Guosheng Lin
- Abstract summary: Video depth estimation aims to infer temporally consistent depth.
Some methods achieve temporal consistency by finetuning a single-image depth model during test time using geometry and re-projection constraints.
We propose a plug-and-play framework that stabilizes inconsistent depth estimations and can be applied to different single-image depth models without extra effort.
- Score: 74.04508918791637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video depth estimation aims to infer temporally consistent depth. Some
methods achieve temporal consistency by finetuning a single-image depth model
during test time using geometry and re-projection constraints, which is
inefficient and not robust. An alternative approach is to learn how to enforce
temporal consistency from data, but this requires well-designed models and
sufficient video depth data. To address these challenges, we propose a
plug-and-play framework called Neural Video Depth Stabilizer (NVDS) that
stabilizes inconsistent depth estimations and can be applied to different
single-image depth models without extra effort. We also introduce a large-scale
dataset, Video Depth in the Wild (VDW), which consists of 14,203 videos with
over two million frames, making it the largest natural-scene video depth
dataset to our knowledge. We evaluate our method on the VDW dataset as well as
two public benchmarks and demonstrate significant improvements in consistency,
accuracy, and efficiency compared to previous approaches. Our work serves as a
solid baseline and provides a data foundation for learning-based video depth
models. We will release our dataset and code for future research.
Related papers
- Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation.
We reformulate the prediction task into a conditional generation problem.
This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Edge-aware Consistent Stereo Video Depth Estimation [3.611754783778107]
We propose a consistent method for dense video depth estimation.
Unlike the existing monocular methods, ours relates to stereo videos.
We show that our edge-aware stereo video model can accurately estimate the dense depth maps.
arXiv Detail & Related papers (2023-05-04T08:30:04Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Globally Consistent Video Depth and Pose Estimation with Efficient
Test-Time Training [15.46056322267856]
We present GCVD, a globally consistent method for learning-based video structure from motion (SfM)
GCVD integrates a compact pose graph into the CNN-based optimization to achieve globally consistent from an effective selection mechanism.
Experimental results show that GCVD outperforms the state-of-the-art methods on both depth and pose estimation.
arXiv Detail & Related papers (2022-08-04T15:12:03Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.