Temporally Consistent Online Depth Estimation Using Point-Based Fusion
- URL: http://arxiv.org/abs/2304.07435v3
- Date: Sat, 5 Aug 2023 00:01:32 GMT
- Title: Temporally Consistent Online Depth Estimation Using Point-Based Fusion
- Authors: Numair Khan, Eric Penner, Douglas Lanman, and Lei Xiao
- Abstract summary: We aim to estimate temporally consistent depth maps of video streams in an online setting.
This is a difficult problem as future frames are not available and the method must choose between enforcing consistency and correcting errors from previous estimations.
We propose to address these challenges by using a global point cloud that is dynamically updated each frame, along with a learned fusion approach in image space.
- Score: 6.5514240555359455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth estimation is an important step in many computer vision problems such
as 3D reconstruction, novel view synthesis, and computational photography. Most
existing work focuses on depth estimation from single frames. When applied to
videos, the result lacks temporal consistency, showing flickering and swimming
artifacts. In this paper we aim to estimate temporally consistent depth maps of
video streams in an online setting. This is a difficult problem as future
frames are not available and the method must choose between enforcing
consistency and correcting errors from previous estimations. The presence of
dynamic objects further complicates the problem. We propose to address these
challenges by using a global point cloud that is dynamically updated each
frame, along with a learned fusion approach in image space. Our approach
encourages consistency while simultaneously allowing updates to handle errors
and dynamic objects. Qualitative and quantitative results show that our method
achieves state-of-the-art quality for consistent video depth estimation.
Related papers
- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Edge-aware Consistent Stereo Video Depth Estimation [3.611754783778107]
We propose a consistent method for dense video depth estimation.
Unlike the existing monocular methods, ours relates to stereo videos.
We show that our edge-aware stereo video model can accurately estimate the dense depth maps.
arXiv Detail & Related papers (2023-05-04T08:30:04Z) - IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding
Alignment [58.8330387551499]
We formulate the problem as estimation of point-wise trajectories (i.e., smooth curves)
We propose IDEA-Net, an end-to-end deep learning framework, which disentangles the problem under the assistance of the explicitly learned temporal consistency.
We demonstrate the effectiveness of our method on various point cloud sequences and observe large improvement over state-of-the-art methods both quantitatively and visually.
arXiv Detail & Related papers (2022-03-22T10:14:08Z) - Temporally Consistent Online Depth Estimation in Dynamic Scenes [17.186528244457055]
Temporally consistent depth estimation is crucial for real-time applications such as augmented reality.
We present a technique to produce temporally consistent depth estimates in dynamic scenes in an online setting.
Our network augments current per-frame stereo networks with novel motion and fusion networks.
arXiv Detail & Related papers (2021-11-17T19:00:51Z) - Consistent Depth of Moving Objects in Video [52.72092264848864]
We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera.
We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction over the entire input video.
We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars) as well as camera motion.
arXiv Detail & Related papers (2021-08-02T20:53:18Z) - Depth-Aware Multi-Grid Deep Homography Estimation with Contextual
Correlation [38.95610086309832]
Homography estimation is an important task in computer vision, such as image stitching, video stabilization, and camera calibration.
Traditional homography estimation methods depend on the quantity and distribution of feature points, leading to poor robustness in textureless scenes.
We propose a contextual correlation layer, which can capture the long-range correlation on feature maps and flexibly be bridged in a learning framework.
We equip our network with depth perception capability, by introducing a novel depth-aware shape-preserved loss.
arXiv Detail & Related papers (2021-07-06T10:33:12Z) - Deep Dual Consecutive Network for Human Pose Estimation [44.41818683253614]
We propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection.
Our method ranks No.1 in the Multi-frame Person Pose Challenge Challenge on the large-scale benchmark datasets PoseTrack 2017 and PoseTrack 2018.
arXiv Detail & Related papers (2021-03-12T13:11:27Z) - Consistent Video Depth Estimation [57.712779457632024]
We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video.
We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video.
Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion.
arXiv Detail & Related papers (2020-04-30T17:59:26Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.