Related papers: Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended Reality

Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended Reality

URL: http://arxiv.org/abs/2309.04183v1
Date: Fri, 8 Sep 2023 07:53:58 GMT
Title: Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended Reality
Authors: Ziang Cheng, Jiayu Yang, Hongdong Li
Abstract summary: Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games. One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses. We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD). This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
Score: 65.70936336240554
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games. Despite significant advancements in deep stereo methods, achieving real-time depth inference with high accuracy on a low-power device remains a major challenge. One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses. To address this issue, we introduce a novel video stereo synthetic dataset that comprises photorealistic renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD). This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios. Our newly proposed dataset enables us to develop a novel framework for continuous video-rate stereo matching. As another contribution, our dataset enables us to proposed a new video-based stereo matching approach tailored for XR applications, which achieves real-time inference at an impressive 134fps on a standard desktop computer, or 30fps on a battery-powered HMD. Our key insight is that disparity and contextual information are highly correlated and redundant between consecutive stereo frames. By unrolling an iterative cost aggregation in time (i.e. in the temporal dimension), we are able to distribute and reuse the aggregated features over time. This approach leads to a substantial reduction in computation without sacrificing accuracy. We conducted extensive evaluations and comparisons and demonstrated that our method achieves superior performance compared to the current state-of-the-art, making it a strong contender for real-time stereo matching in VR/AR applications.

Related papers

FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications. Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings. We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z)
Stereo Any Video: Temporally Consistent Stereo Matching [15.876953256378224]
This paper introduces Stereo Any Video, a powerful framework for video stereo matching. It can estimate spatially accurate and temporally consistent disparities without relying on auxiliary information such as camera poses or optical flow. Key architectural innovations are introduced: all-to-all-pairs correlation, which constructs smooth and robust matching cost volumes, and temporal convex upsampling, which improves temporal coherence.
arXiv Detail & Related papers (2025-03-07T16:20:36Z)
What Makes for a Good Stereoscopic Image? [34.13972770089699]
We present SCOPE, a new dataset comprised of real and synthetic stereoscopic images featuring a wide range of common perceptual distortions and artifacts. We also present iSQoE, a new model for stereo quality of experience assessment trained on our dataset.
arXiv Detail & Related papers (2024-12-30T17:58:50Z)
StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart [45.27524689977587]
We introduce textitStereoCrafter-Zero, a novel framework for zero-shot stereo video generation. Key innovations include a noisy restart strategy to initialize stereo-aware latents and an iterative refinement process. Our framework is robust and adaptable across various diffusion models, setting a new benchmark for zero-shot stereo video generation.
arXiv Detail & Related papers (2024-11-21T16:41:55Z)
Match Stereo Videos via Bidirectional Alignment [15.876953256378224]
Recent learning-based methods often focus on optimizing performance for independent stereo pairs, leading to temporal inconsistencies in videos. We introduce a novel video processing framework, BiDAStereo, and a plugin stabilizer network, BiDAStabilizer, compatible with general image-based methods. We present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse urban scenes for qualitative evaluation.
arXiv Detail & Related papers (2024-09-30T13:37:29Z)
3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation. Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting. We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z)
Video Frame Interpolation with Stereo Event and Intensity Camera [40.07341828127157]
We propose a novel Stereo Event-based VFI network (SE-VFI-Net) to generate high-quality intermediate frames. We exploit the fused features accomplishing accurate optical flow and disparity estimation. Our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-07-17T04:02:00Z)
DynamicStereo: Consistent Dynamic Depth from Stereo Videos [91.1804971397608]
We propose DynamicStereo to estimate disparity for stereo videos. The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions. We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments.
arXiv Detail & Related papers (2023-05-03T17:40:49Z)
Deep Parametric 3D Filters for Joint Video Denoising and Illumination Enhancement in Video Super Resolution [96.89588203312451]
This paper presents a new parametric representation called Deep Parametric 3D Filters (DP3DF) DP3DF incorporates local information to enable simultaneous denoising, illumination enhancement, and SR efficiently in a single encoder-and-decoder network. Also, a dynamic residual frame is jointly learned with the DP3DF via a shared backbone to further boost the SR quality.
arXiv Detail & Related papers (2022-07-05T03:57:25Z)
Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV. We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z)
SMD-Nets: Stereo Mixture Density Networks [68.56947049719936]
We propose Stereo Mixture Density Networks (SMD-Nets), a simple yet effective learning framework compatible with a wide class of 2D and 3D architectures. Specifically, we exploit bimodal mixture densities as output representation and show that this allows for sharp and precise disparity estimates near discontinuities. We carry out comprehensive experiments on a new high-resolution and highly realistic synthetic stereo dataset, consisting of stereo pairs at 8Mpx resolution, as well as on real-world stereo datasets.
arXiv Detail & Related papers (2021-04-08T16:15:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.