MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere
Images
- URL: http://arxiv.org/abs/2008.06534v1
- Date: Fri, 14 Aug 2020 18:33:05 GMT
- Title: MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere
Images
- Authors: Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, and
James Tompkin
- Abstract summary: We introduce a method to convert stereo 360deg (omnidirectional stereo) imagery into a layered, multi-sphere image representation for 6DoF rendering.
This significantly improves comfort for the viewer, and can be inferred and rendered in real time on modern GPU hardware.
- Score: 26.899767088485184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a method to convert stereo 360{\deg} (omnidirectional stereo)
imagery into a layered, multi-sphere image representation for six
degree-of-freedom (6DoF) rendering. Stereo 360{\deg} imagery can be captured
from multi-camera systems for virtual reality (VR), but lacks motion parallax
and correct-in-all-directions disparity cues. Together, these can quickly lead
to VR sickness when viewing content. One solution is to try and generate a
format suitable for 6DoF rendering, such as by estimating depth. However, this
raises questions as to how to handle disoccluded regions in dynamic scenes. Our
approach is to simultaneously learn depth and disocclusions via a multi-sphere
image representation, which can be rendered with correct 6DoF disparity and
motion parallax in VR. This significantly improves comfort for the viewer, and
can be inferred and rendered in real time on modern GPU hardware. Together,
these move towards making VR video a more comfortable immersive medium.
Related papers
- 6Img-to-3D: Few-Image Large-Scale Outdoor Driving Scene Reconstruction [44.99833362998488]
We introduce 6Img-to-3D, an efficient, scalable transformer-based encoder-renderer method for single-shot image to 3D reconstruction.
Our method outputs a 3D-consistent parameterized triplane from only six outward-facing input images for large-scale, unbounded outdoor driving scenarios.
arXiv Detail & Related papers (2024-04-18T17:58:16Z) - Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via
Transformer-Based 360 Image Outpainting [33.95741744421632]
We propose a transformer-based 360 image outpainting framework called Dream360.
It can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports.
Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods.
arXiv Detail & Related papers (2024-01-19T09:01:20Z) - Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended
Reality [65.70936336240554]
Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games.
One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses.
We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD).
This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
arXiv Detail & Related papers (2023-09-08T07:53:58Z) - Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image.
Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories.
We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z) - Immersive Neural Graphics Primitives [13.48024951446282]
We present and evaluate a NeRF-based framework that is capable of rendering scenes in immersive VR.
Our approach can yield a frame rate of 30 frames per second with a resolution of 1280x720 pixels per eye.
arXiv Detail & Related papers (2022-11-24T09:33:38Z) - 3D Moments from Near-Duplicate Photos [67.15199743223332]
3D Moments is a new computational photography effect.
We produce a video that smoothly interpolates the scene motion from the first photo to the second.
Our system produces photorealistic space-time videos with motion parallax and scene dynamics.
arXiv Detail & Related papers (2022-05-12T17:56:18Z) - Deep 3D Mask Volume for View Synthesis of Dynamic Scenes [49.45028543279115]
We introduce a multi-view video dataset, captured with a custom 10-camera rig in 120FPS.
The dataset contains 96 high-quality scenes showing various visual effects and human interactions in outdoor scenes.
We develop a new algorithm, Deep 3D Mask Volume, which enables temporally-stable view extrapolation from binocular videos of dynamic scenes, captured by static cameras.
arXiv Detail & Related papers (2021-08-30T17:55:28Z) - Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual
Reality [68.18446501943585]
Social presence will fuel the next generation of communication systems driven by digital humans in virtual reality (VR)
The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models.
This paper makes progress in overcoming these limitations by proposing an end-to-end multi-identity architecture.
arXiv Detail & Related papers (2021-04-10T15:48:53Z) - Learning to compose 6-DoF omnidirectional videos using multi-sphere
images [16.423725132964776]
We propose a system that uses a 3D ConvNet to generate a multi-sphere images representation that can be experienced in 6-DoF VR.
The system utilizes conventional omnidirectional VR camera footage directly without the need for a depth map or segmentation mask.
A ground truth generation approach for high-quality artifact-free 6-DoF contents is proposed and can be used by the research and development community.
arXiv Detail & Related papers (2021-03-10T03:09:55Z) - Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.