DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal
Fusion
- URL: http://arxiv.org/abs/2012.02177v1
- Date: Thu, 3 Dec 2020 18:54:03 GMT
- Title: DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal
Fusion
- Authors: Arda D\"uz\c{c}eker, Silvano Galliani, Christoph Vogel, Pablo
Speciale, Mihai Dusmanu, Marc Pollefeys
- Abstract summary: We propose an online multi-view depth prediction approach on posed video streams.
The scene geometry information computed in the previous time steps is propagated to the current time step.
We outperform the existing state-of-the-art multi-view stereo methods on most of the evaluated metrics.
- Score: 67.64047158294062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an online multi-view depth prediction approach on posed video
streams, where the scene geometry information computed in the previous time
steps is propagated to the current time step in an efficient and geometrically
plausible way. The backbone of our approach is a real-time capable, lightweight
encoder-decoder that relies on cost volumes computed from pairs of images. We
extend it by placing a ConvLSTM cell at the bottleneck layer, which compresses
an arbitrary amount of past information in its states. The novelty lies in
propagating the hidden state of the cell by accounting for the viewpoint
changes between time steps. At a given time step, we warp the previous hidden
state into the current camera plane using the previous depth prediction. Our
extension brings only a small overhead of computation time and memory
consumption, while improving the depth predictions significantly. As a result,
we outperform the existing state-of-the-art multi-view stereo methods on most
of the evaluated metrics in hundreds of indoor scenes while maintaining a
real-time performance. Code available:
https://github.com/ardaduz/deep-video-mvs
Related papers
- DoubleTake: Geometry Guided Depth Estimation [17.464549832122714]
Estimating depth from a sequence of posed RGB images is a fundamental computer vision task.
We introduce a reconstruction which combines volume features with a hint of the prior geometry, rendered as a depth map from the current camera location.
We demonstrate that our method can run at interactive speeds, state-of-the-art estimates of depth and 3D scene in both offline and incremental evaluation scenarios.
arXiv Detail & Related papers (2024-06-26T14:29:05Z) - Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation.
We reformulate the prediction task into a conditional generation problem.
This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z) - Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information [45.31198546289057]
This paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP)
It aims to improve the precision of viewport prediction in volumetric video streaming.
In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity.
arXiv Detail & Related papers (2023-11-28T03:45:29Z) - MAMo: Leveraging Memory and Attention for Monocular Video Depth
Estimation [53.90194273249202]
We propose MAMo, a novel memory and attention frame-work for monocular video depth estimation.
In MAMo, we augment model with memory which aids the depth prediction as the model streams through the video.
We show that MAMo consistently improves monocular depth estimation networks and sets new state-of-the-art (SOTA) accuracy.
arXiv Detail & Related papers (2023-07-26T17:55:32Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth [83.77839773394106]
We present a lightweight, tightly-coupled deep depth network and visual-inertial odometry system.
We provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction.
We show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
arXiv Detail & Related papers (2020-12-18T09:42:54Z) - Multiple Instance-Based Video Anomaly Detection using Deep Temporal
Encoding-Decoding [5.255783459833821]
We propose a weakly supervised deep temporal encoding-decoding solution for anomaly detection in surveillance videos.
The proposed approach uses both abnormal and normal video clips during the training phase.
The results show that the proposed method performs similar to or better than the state-of-the-art solutions for anomaly detection in video surveillance applications.
arXiv Detail & Related papers (2020-07-03T08:22:42Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.