DynamicStereo: Consistent Dynamic Depth from Stereo Videos
- URL: http://arxiv.org/abs/2305.02296v1
- Date: Wed, 3 May 2023 17:40:49 GMT
- Title: DynamicStereo: Consistent Dynamic Depth from Stereo Videos
- Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova,
Andrea Vedaldi, Christian Rupprecht
- Abstract summary: We propose DynamicStereo to estimate disparity for stereo videos.
The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions.
We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments.
- Score: 91.1804971397608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of reconstructing a dynamic scene observed from a
stereo camera. Most existing methods for depth from stereo treat different
stereo frames independently, leading to temporally inconsistent depth
predictions. Temporal consistency is especially important for immersive AR or
VR scenarios, where flickering greatly diminishes the user experience. We
propose DynamicStereo, a novel transformer-based architecture to estimate
disparity for stereo videos. The network learns to pool information from
neighboring frames to improve the temporal consistency of its predictions. Our
architecture is designed to process stereo videos efficiently through divided
attention layers. We also introduce Dynamic Replica, a new benchmark dataset
containing synthetic videos of people and animals in scanned environments,
which provides complementary training and evaluation data for dynamic stereo
closer to real applications than existing datasets. Training with this dataset
further improves the quality of predictions of our proposed DynamicStereo as
well as prior methods. Finally, it acts as a benchmark for consistent stereo
methods.
Related papers
- Match Stereo Videos via Bidirectional Alignment [15.876953256378224]
Recent learning-based methods often focus on optimizing performance for independent stereo pairs, leading to temporal inconsistencies in videos.
We introduce a novel video processing framework, BiDAStereo, and a plugin stabilizer network, BiDAStabilizer, compatible with general image-based methods.
We present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse urban scenes for qualitative evaluation.
arXiv Detail & Related papers (2024-09-30T13:37:29Z) - Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching [17.344430840048094]
Recent learning-based methods prioritize optimal performance on a single stereo pair, resulting in temporal inconsistencies.
We develop a bidirectional alignment mechanism for adjacent frames as a fundamental operation.
Unlike the existing methods, we model this task as local matching and global aggregation.
arXiv Detail & Related papers (2024-03-16T01:38:28Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields [71.94156412354054]
We propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN)
DynaMoN handles dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis.
We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset.
arXiv Detail & Related papers (2023-09-16T08:46:59Z) - Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended
Reality [65.70936336240554]
Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games.
One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses.
We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD).
This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
arXiv Detail & Related papers (2023-09-08T07:53:58Z) - MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth
Estimates from Multi-exposure Stereo Images for HDR 3D Applications [0.22940141855172028]
We develop a novel deep architecture for multi-exposure stereo depth estimation.
For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed.
In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods.
arXiv Detail & Related papers (2022-06-21T13:23:22Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - Temporally Consistent Online Depth Estimation in Dynamic Scenes [17.186528244457055]
Temporally consistent depth estimation is crucial for real-time applications such as augmented reality.
We present a technique to produce temporally consistent depth estimates in dynamic scenes in an online setting.
Our network augments current per-frame stereo networks with novel motion and fusion networks.
arXiv Detail & Related papers (2021-11-17T19:00:51Z) - Self-Supervised Depth Completion for Active Stereo [55.79929735390945]
Active stereo systems are widely used in the robotics industry due to their low cost and high quality depth maps.
These depth sensors suffer from stereo artefacts and do not provide dense depth estimates.
We present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps.
arXiv Detail & Related papers (2021-10-07T07:33:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.