DynamicStereo: Consistent Dynamic Depth from Stereo Videos
- URL: http://arxiv.org/abs/2305.02296v1
- Date: Wed, 3 May 2023 17:40:49 GMT
- Title: DynamicStereo: Consistent Dynamic Depth from Stereo Videos
- Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova,
Andrea Vedaldi, Christian Rupprecht
- Abstract summary: We propose DynamicStereo to estimate disparity for stereo videos.
The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions.
We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments.
- Score: 91.1804971397608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of reconstructing a dynamic scene observed from a
stereo camera. Most existing methods for depth from stereo treat different
stereo frames independently, leading to temporally inconsistent depth
predictions. Temporal consistency is especially important for immersive AR or
VR scenarios, where flickering greatly diminishes the user experience. We
propose DynamicStereo, a novel transformer-based architecture to estimate
disparity for stereo videos. The network learns to pool information from
neighboring frames to improve the temporal consistency of its predictions. Our
architecture is designed to process stereo videos efficiently through divided
attention layers. We also introduce Dynamic Replica, a new benchmark dataset
containing synthetic videos of people and animals in scanned environments,
which provides complementary training and evaluation data for dynamic stereo
closer to real applications than existing datasets. Training with this dataset
further improves the quality of predictions of our proposed DynamicStereo as
well as prior methods. Finally, it acts as a benchmark for consistent stereo
methods.
Related papers
- Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos [101.48581851337703]
We present BTimer, the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes.
Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames.
Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets.
arXiv Detail & Related papers (2024-12-04T18:15:06Z) - Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation [83.841877607646]
We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation.
The dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images.
We benchmark leading stereo depth estimation models for both standard and omnidirectional images.
arXiv Detail & Related papers (2024-11-27T13:34:41Z) - Match Stereo Videos via Bidirectional Alignment [15.876953256378224]
Recent learning-based methods often focus on optimizing performance for independent stereo pairs, leading to temporal inconsistencies in videos.
We introduce a novel video processing framework, BiDAStereo, and a plugin stabilizer network, BiDAStabilizer, compatible with general image-based methods.
We present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse urban scenes for qualitative evaluation.
arXiv Detail & Related papers (2024-09-30T13:37:29Z) - Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching [17.344430840048094]
Recent learning-based methods prioritize optimal performance on a single stereo pair, resulting in temporal inconsistencies.
We develop a bidirectional alignment mechanism for adjacent frames as a fundamental operation.
Unlike the existing methods, we model this task as local matching and global aggregation.
arXiv Detail & Related papers (2024-03-16T01:38:28Z) - Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended
Reality [65.70936336240554]
Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games.
One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses.
We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD).
This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
arXiv Detail & Related papers (2023-09-08T07:53:58Z) - MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth
Estimates from Multi-exposure Stereo Images for HDR 3D Applications [0.22940141855172028]
We develop a novel deep architecture for multi-exposure stereo depth estimation.
For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed.
In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods.
arXiv Detail & Related papers (2022-06-21T13:23:22Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - Temporally Consistent Online Depth Estimation in Dynamic Scenes [17.186528244457055]
Temporally consistent depth estimation is crucial for real-time applications such as augmented reality.
We present a technique to produce temporally consistent depth estimates in dynamic scenes in an online setting.
Our network augments current per-frame stereo networks with novel motion and fusion networks.
arXiv Detail & Related papers (2021-11-17T19:00:51Z) - Self-Supervised Depth Completion for Active Stereo [55.79929735390945]
Active stereo systems are widely used in the robotics industry due to their low cost and high quality depth maps.
These depth sensors suffer from stereo artefacts and do not provide dense depth estimates.
We present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps.
arXiv Detail & Related papers (2021-10-07T07:33:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.