Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended
Reality
- URL: http://arxiv.org/abs/2309.04183v1
- Date: Fri, 8 Sep 2023 07:53:58 GMT
- Title: Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended
Reality
- Authors: Ziang Cheng, Jiayu Yang, Hongdong Li
- Abstract summary: Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games.
One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses.
We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD).
This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
- Score: 65.70936336240554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time Stereo Matching is a cornerstone algorithm for many Extended
Reality (XR) applications, such as indoor 3D understanding, video pass-through,
and mixed-reality games. Despite significant advancements in deep stereo
methods, achieving real-time depth inference with high accuracy on a low-power
device remains a major challenge. One of the major difficulties is the lack of
high-quality indoor video stereo training datasets captured by head-mounted
VR/AR glasses. To address this issue, we introduce a novel video stereo
synthetic dataset that comprises photorealistic renderings of various indoor
scenes and realistic camera motion captured by a 6-DoF moving VR/AR
head-mounted display (HMD). This facilitates the evaluation of existing
approaches and promotes further research on indoor augmented reality scenarios.
Our newly proposed dataset enables us to develop a novel framework for
continuous video-rate stereo matching.
As another contribution, our dataset enables us to proposed a new video-based
stereo matching approach tailored for XR applications, which achieves real-time
inference at an impressive 134fps on a standard desktop computer, or 30fps on a
battery-powered HMD. Our key insight is that disparity and contextual
information are highly correlated and redundant between consecutive stereo
frames. By unrolling an iterative cost aggregation in time (i.e. in the
temporal dimension), we are able to distribute and reuse the aggregated
features over time. This approach leads to a substantial reduction in
computation without sacrificing accuracy. We conducted extensive evaluations
and comparisons and demonstrated that our method achieves superior performance
compared to the current state-of-the-art, making it a strong contender for
real-time stereo matching in VR/AR applications.
Related papers
- Match Stereo Videos via Bidirectional Alignment [15.876953256378224]
Recent learning-based methods often focus on optimizing performance for independent stereo pairs, leading to temporal inconsistencies in videos.
We introduce a novel video processing framework, BiDAStereo, and a plugin stabilizer network, BiDAStabilizer, compatible with general image-based methods.
We present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse urban scenes for qualitative evaluation.
arXiv Detail & Related papers (2024-09-30T13:37:29Z) - Event-based Stereo Depth Estimation: A Survey [12.711235562366898]
Stereopsis has widespread appeal in robotics as it is the predominant way by which living beings perceive depth to navigate our 3D world.
Event cameras are novel bio-inspired sensors that detect per-pixel brightness changes asynchronously, with very high temporal resolution and high dynamic range.
The high temporal precision also benefits stereo matching, making disparity (depth) estimation a popular research area for event cameras ever since its inception.
arXiv Detail & Related papers (2024-09-26T09:43:50Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - Video Frame Interpolation with Stereo Event and Intensity Camera [40.07341828127157]
We propose a novel Stereo Event-based VFI network (SE-VFI-Net) to generate high-quality intermediate frames.
We exploit the fused features accomplishing accurate optical flow and disparity estimation.
Our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-07-17T04:02:00Z) - DynamicStereo: Consistent Dynamic Depth from Stereo Videos [91.1804971397608]
We propose DynamicStereo to estimate disparity for stereo videos.
The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions.
We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments.
arXiv Detail & Related papers (2023-05-03T17:40:49Z) - Deep Parametric 3D Filters for Joint Video Denoising and Illumination
Enhancement in Video Super Resolution [96.89588203312451]
This paper presents a new parametric representation called Deep Parametric 3D Filters (DP3DF)
DP3DF incorporates local information to enable simultaneous denoising, illumination enhancement, and SR efficiently in a single encoder-and-decoder network.
Also, a dynamic residual frame is jointly learned with the DP3DF via a shared backbone to further boost the SR quality.
arXiv Detail & Related papers (2022-07-05T03:57:25Z) - Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV.
We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z) - SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures.
Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities.
We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.