Related papers: DynamicStereo: Consistent Dynamic Depth from Stereo Videos

DynamicStereo: Consistent Dynamic Depth from Stereo Videos

URL: http://arxiv.org/abs/2305.02296v1
Date: Wed, 3 May 2023 17:40:49 GMT
Title: DynamicStereo: Consistent Dynamic Depth from Stereo Videos
Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht
Abstract summary: We propose DynamicStereo to estimate disparity for stereo videos. The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions. We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments.
Score: 91.1804971397608
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of reconstructing a dynamic scene observed from a stereo camera. Most existing methods for depth from stereo treat different stereo frames independently, leading to temporally inconsistent depth predictions. Temporal consistency is especially important for immersive AR or VR scenarios, where flickering greatly diminishes the user experience. We propose DynamicStereo, a novel transformer-based architecture to estimate disparity for stereo videos. The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions. Our architecture is designed to process stereo videos efficiently through divided attention layers. We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments, which provides complementary training and evaluation data for dynamic stereo closer to real applications than existing datasets. Training with this dataset further improves the quality of predictions of our proposed DynamicStereo as well as prior methods. Finally, it acts as a benchmark for consistent stereo methods.

Related papers

StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation [50.105716381966516]
We argue that video depth estimation is not a naive extension of image depth estimation.<n>We introduce StereoDiff, a two-stage video depth estimator that synergizes stereo matching for mainly the static areas with video depth diffusion.
arXiv Detail & Related papers (2025-06-25T18:35:10Z)
Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos [101.48581851337703]
We present BTimer, the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes. Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames. Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets.
arXiv Detail & Related papers (2024-12-04T18:15:06Z)
Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation [83.841877607646]
We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation. The dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images. We benchmark leading stereo depth estimation models for both standard and omnidirectional images.
arXiv Detail & Related papers (2024-11-27T13:34:41Z)
Match Stereo Videos via Bidirectional Alignment [15.876953256378224]
Recent learning-based methods often focus on optimizing performance for independent stereo pairs, leading to temporal inconsistencies in videos. We introduce a novel video processing framework, BiDAStereo, and a plugin stabilizer network, BiDAStabilizer, compatible with general image-based methods. We present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse urban scenes for qualitative evaluation.
arXiv Detail & Related papers (2024-09-30T13:37:29Z)
Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching [17.344430840048094]
Recent learning-based methods prioritize optimal performance on a single stereo pair, resulting in temporal inconsistencies. We develop a bidirectional alignment mechanism for adjacent frames as a fundamental operation. Unlike the existing methods, we model this task as local matching and global aggregation.
arXiv Detail & Related papers (2024-03-16T01:38:28Z)
3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation. Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting. We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z)
DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields [71.94156412354054]
We propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN) DynaMoN handles dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis. We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset.
arXiv Detail & Related papers (2023-09-16T08:46:59Z)
Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended Reality [65.70936336240554]
Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games. One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses. We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD). This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
arXiv Detail & Related papers (2023-09-08T07:53:58Z)
MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth Estimates from Multi-exposure Stereo Images for HDR 3D Applications [0.22940141855172028]
We develop a novel deep architecture for multi-exposure stereo depth estimation. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods.
arXiv Detail & Related papers (2022-06-21T13:23:22Z)
Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes. We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature. We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z)
Temporally Consistent Online Depth Estimation in Dynamic Scenes [17.186528244457055]
Temporally consistent depth estimation is crucial for real-time applications such as augmented reality. We present a technique to produce temporally consistent depth estimates in dynamic scenes in an online setting. Our network augments current per-frame stereo networks with novel motion and fusion networks.
arXiv Detail & Related papers (2021-11-17T19:00:51Z)
Self-Supervised Depth Completion for Active Stereo [55.79929735390945]
Active stereo systems are widely used in the robotics industry due to their low cost and high quality depth maps. These depth sensors suffer from stereo artefacts and do not provide dense depth estimates. We present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps.
arXiv Detail & Related papers (2021-10-07T07:33:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.