Related papers: MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

URL: http://arxiv.org/abs/2008.06534v1
Date: Fri, 14 Aug 2020 18:33:05 GMT
Title: MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images
Authors: Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, and James Tompkin
Abstract summary: We introduce a method to convert stereo 360deg (omnidirectional stereo) imagery into a layered, multi-sphere image representation for 6DoF rendering. This significantly improves comfort for the viewer, and can be inferred and rendered in real time on modern GPU hardware.
Score: 26.899767088485184
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a method to convert stereo 360{\deg} (omnidirectional stereo) imagery into a layered, multi-sphere image representation for six degree-of-freedom (6DoF) rendering. Stereo 360{\deg} imagery can be captured from multi-camera systems for virtual reality (VR), but lacks motion parallax and correct-in-all-directions disparity cues. Together, these can quickly lead to VR sickness when viewing content. One solution is to try and generate a format suitable for 6DoF rendering, such as by estimating depth. However, this raises questions as to how to handle disoccluded regions in dynamic scenes. Our approach is to simultaneously learn depth and disocclusions via a multi-sphere image representation, which can be rendered with correct 6DoF disparity and motion parallax in VR. This significantly improves comfort for the viewer, and can be inferred and rendered in real time on modern GPU hardware. Together, these move towards making VR video a more comfortable immersive medium.

Related papers

Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images [5.754780404074765]
We propose representing a complete 3D space for dynamic scene video by modeling explicit representations, specifically 4D Gaussians, from a single image. As far as we know, this is the first attempt that considers animation while representing a complete 3D space from a single landscape image.
arXiv Detail & Related papers (2025-04-04T06:51:39Z)
From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos [71.22810401256234]
Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world. Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects. We introduce 360-1M, a 360 video dataset, and a process for efficiently finding corresponding frames from diverse viewpoints at scale.
arXiv Detail & Related papers (2024-12-10T18:59:44Z)
Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images [52.48351378615057]
textitSplatter-360 is a novel end-to-end generalizable 3DGS framework to handle wide-baseline panoramic images. We introduce a 3D-aware bi-projection encoder to mitigate the distortions inherent in panoramic images. This enables robust 3D-aware feature representations and real-time rendering capabilities.
arXiv Detail & Related papers (2024-12-09T06:58:31Z)
6Img-to-3D: Few-Image Large-Scale Outdoor Driving Scene Reconstruction [44.99833362998488]
We introduce 6Img-to-3D, an efficient, scalable transformer-based encoder-renderer method for single-shot image to 3D reconstruction. Our method outputs a 3D-consistent parameterized triplane from only six outward-facing input images for large-scale, unbounded outdoor driving scenarios.
arXiv Detail & Related papers (2024-04-18T17:58:16Z)
Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360 Image Outpainting [33.95741744421632]
We propose a transformer-based 360 image outpainting framework called Dream360. It can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports. Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods.
arXiv Detail & Related papers (2024-01-19T09:01:20Z)
Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended Reality [65.70936336240554]
Real-time Stereo Matching is a cornerstone algorithm for many Extended Reality (XR) applications, such as indoor 3D understanding, video pass-through, and mixed-reality games. One of the major difficulties is the lack of high-quality indoor video stereo training datasets captured by head-mounted VR/AR glasses. We introduce a novel video stereo synthetic dataset that comprises renderings of various indoor scenes and realistic camera motion captured by a 6-DoF moving VR/AR head-mounted display (HMD). This facilitates the evaluation of existing approaches and promotes further research on indoor augmented reality scenarios.
arXiv Detail & Related papers (2023-09-08T07:53:58Z)
Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image [59.18564636990079]
We study the problem of synthesizing a long-term dynamic video from only a single image. Existing methods either hallucinate inconsistent perpetual views or struggle with long camera trajectories. We present Make-It-4D, a novel method that can generate a consistent long-term dynamic video from a single image.
arXiv Detail & Related papers (2023-08-20T12:53:50Z)
Immersive Neural Graphics Primitives [13.48024951446282]
We present and evaluate a NeRF-based framework that is capable of rendering scenes in immersive VR. Our approach can yield a frame rate of 30 frames per second with a resolution of 1280x720 pixels per eye.
arXiv Detail & Related papers (2022-11-24T09:33:38Z)
3D Moments from Near-Duplicate Photos [67.15199743223332]
3D Moments is a new computational photography effect. We produce a video that smoothly interpolates the scene motion from the first photo to the second. Our system produces photorealistic space-time videos with motion parallax and scene dynamics.
arXiv Detail & Related papers (2022-05-12T17:56:18Z)
Deep 3D Mask Volume for View Synthesis of Dynamic Scenes [49.45028543279115]
We introduce a multi-view video dataset, captured with a custom 10-camera rig in 120FPS. The dataset contains 96 high-quality scenes showing various visual effects and human interactions in outdoor scenes. We develop a new algorithm, Deep 3D Mask Volume, which enables temporally-stable view extrapolation from binocular videos of dynamic scenes, captured by static cameras.
arXiv Detail & Related papers (2021-08-30T17:55:28Z)
Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality [68.18446501943585]
Social presence will fuel the next generation of communication systems driven by digital humans in virtual reality (VR) The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models. This paper makes progress in overcoming these limitations by proposing an end-to-end multi-identity architecture.
arXiv Detail & Related papers (2021-04-10T15:48:53Z)
Learning to compose 6-DoF omnidirectional videos using multi-sphere images [16.423725132964776]
We propose a system that uses a 3D ConvNet to generate a multi-sphere images representation that can be experienced in 6-DoF VR. The system utilizes conventional omnidirectional VR camera footage directly without the need for a depth map or segmentation mask. A ground truth generation approach for high-quality artifact-free 6-DoF contents is proposed and can be used by the research and development community.
arXiv Detail & Related papers (2021-03-10T03:09:55Z)
Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.