MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere
Images
- URL: http://arxiv.org/abs/2008.06534v1
- Date: Fri, 14 Aug 2020 18:33:05 GMT
- Title: MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere
Images
- Authors: Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, and
James Tompkin
- Abstract summary: We introduce a method to convert stereo 360deg (omnidirectional stereo) imagery into a layered, multi-sphere image representation for 6DoF rendering.
This significantly improves comfort for the viewer, and can be inferred and rendered in real time on modern GPU hardware.
- Score: 26.899767088485184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a method to convert stereo 360{\deg} (omnidirectional stereo)
imagery into a layered, multi-sphere image representation for six
degree-of-freedom (6DoF) rendering. Stereo 360{\deg} imagery can be captured
from multi-camera systems for virtual reality (VR), but lacks motion parallax
and correct-in-all-directions disparity cues. Together, these can quickly lead
to VR sickness when viewing content. One solution is to try and generate a
format suitable for 6DoF rendering, such as by estimating depth. However, this
raises questions as to how to handle disoccluded regions in dynamic scenes. Our
approach is to simultaneously learn depth and disocclusions via a multi-sphere
image representation, which can be rendered with correct 6DoF disparity and
motion parallax in VR. This significantly improves comfort for the viewer, and
can be inferred and rendered in real time on modern GPU hardware. Together,
these move towards making VR video a more comfortable immersive medium.
Related papers
- CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models [9.622857933809067]
CAP4D is an approach that uses a morphable multi-view diffusion model to reconstruct photoreal 4D portrait avatars from any number of reference images.
Our approach demonstrates state-of-the-art performance for single-, few-, and multi-image 4D portrait avatar reconstruction.
arXiv Detail & Related papers (2024-12-16T18:58:51Z) - From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos [71.22810401256234]
Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world.
Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects.
We introduce 360-1M, a 360 video dataset, and a process for efficiently finding corresponding frames from diverse viewpoints at scale.
arXiv Detail & Related papers (2024-12-10T18:59:44Z) - Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images [52.48351378615057]
textitSplatter-360 is a novel end-to-end generalizable 3DGS framework to handle wide-baseline panoramic images.
We introduce a 3D-aware bi-projection encoder to mitigate the distortions inherent in panoramic images.
This enables robust 3D-aware feature representations and real-time rendering capabilities.
arXiv Detail & Related papers (2024-12-09T06:58:31Z) - Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via
Transformer-Based 360 Image Outpainting [33.95741744421632]
We propose a transformer-based 360 image outpainting framework called Dream360.
It can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports.
Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods.
arXiv Detail & Related papers (2024-01-19T09:01:20Z) - Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended
Reality [65.70936336240554]
Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games.
One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses.
We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD).
This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
arXiv Detail & Related papers (2023-09-08T07:53:58Z) - Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from
a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image.
Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories.
We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z) - Deep 3D Mask Volume for View Synthesis of Dynamic Scenes [49.45028543279115]
We introduce a multi-view video dataset, captured with a custom 10-camera rig in 120FPS.
The dataset contains 96 high-quality scenes showing various visual effects and human interactions in outdoor scenes.
We develop a new algorithm, Deep 3D Mask Volume, which enables temporally-stable view extrapolation from binocular videos of dynamic scenes, captured by static cameras.
arXiv Detail & Related papers (2021-08-30T17:55:28Z) - Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual
Reality [68.18446501943585]
Social presence will fuel the next generation of communication systems driven by digital humans in virtual reality (VR)
The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models.
This paper makes progress in overcoming these limitations by proposing an end-to-end multi-identity architecture.
arXiv Detail & Related papers (2021-04-10T15:48:53Z) - Learning to compose 6-DoF omnidirectional videos using multi-sphere
images [16.423725132964776]
We propose a system that uses a 3D ConvNet to generate a multi-sphere images representation that can be experienced in 6-DoF VR.
The system utilizes conventional omnidirectional VR camera footage directly without the need for a depth map or segmentation mask.
A ground truth generation approach for high-quality artifact-free 6-DoF contents is proposed and can be used by the research and development community.
arXiv Detail & Related papers (2021-03-10T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.