Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via
Transformer-Based 360 Image Outpainting
- URL: http://arxiv.org/abs/2401.10564v1
- Date: Fri, 19 Jan 2024 09:01:20 GMT
- Title: Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via
Transformer-Based 360 Image Outpainting
- Authors: Hao Ai, Zidong Cao, Haonan Lu, Chen Chen, Jian Ma, Pengyuan Zhou,
Tae-Kyun Kim, Pan Hui, and Lin Wang
- Abstract summary: We propose a transformer-based 360 image outpainting framework called Dream360.
It can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports.
Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods.
- Score: 33.95741744421632
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 360 images, with a field-of-view (FoV) of 180x360, provide immersive and
realistic environments for emerging virtual reality (VR) applications, such as
virtual tourism, where users desire to create diverse panoramic scenes from a
narrow FoV photo they take from a viewpoint via portable devices. It thus
brings us to a technical challenge: `How to allow the users to freely create
diverse and immersive virtual scenes from a narrow FoV image with a specified
viewport?' To this end, we propose a transformer-based 360 image outpainting
framework called Dream360, which can generate diverse, high-fidelity, and
high-resolution panoramas from user-selected viewports, considering the
spherical properties of 360 images. Compared with existing methods, e.g., [3],
which primarily focus on inputs with rectangular masks and central locations
while overlooking the spherical property of 360 images, our Dream360 offers
higher outpainting flexibility and fidelity based on the spherical
representation. Dream360 comprises two key learning stages: (I) codebook-based
panorama outpainting via Spherical-VQGAN (S-VQGAN), and (II) frequency-aware
refinement with a novel frequency-aware consistency loss. Specifically, S-VQGAN
learns a sphere-specific codebook from spherical harmonic (SH) values,
providing a better representation of spherical data distribution for scene
modeling. The frequency-aware refinement matches the resolution and further
improves the semantic consistency and visual fidelity of the generated results.
Our Dream360 achieves significantly lower Frechet Inception Distance (FID)
scores and better visual fidelity than existing methods. We also conducted a
user study involving 15 participants to interactively evaluate the quality of
the generated results in VR, demonstrating the flexibility and superiority of
our Dream360 framework.
Related papers
- GPAvatar: Generalizable and Precise Head Avatar from Image(s) [71.555405205039]
GPAvatar is a framework that reconstructs 3D head avatars from one or several images in a single forward pass.
The proposed method achieves faithful identity reconstruction, precise expression control, and multi-view consistency.
arXiv Detail & Related papers (2024-01-18T18:56:34Z) - See360: Novel Panoramic View Interpolation [24.965259708297932]
See360 is a versatile and efficient framework for 360 panoramic view using latent space viewpoint estimation.
We show that the proposed method is generic enough to achieve real-time rendering of arbitrary views for four datasets.
arXiv Detail & Related papers (2024-01-07T09:17:32Z) - VR-NeRF: High-Fidelity Virtualized Walkable Spaces [55.51127858816994]
We present an end-to-end system for the high-fidelity capture, model reconstruction, and real-time rendering of walkable spaces in virtual reality using neural radiance fields.
arXiv Detail & Related papers (2023-11-05T02:03:14Z) - Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation [36.45222068699805]
AOG-Net is proposed for 360-degree image generation by out-painting an incomplete image progressively with NFoV and text guidances joinly or individually.
A global-local conditioning mechanism is devised to formulate the outpainting guidance in each autoregressive step.
Comprehensive experiments on two commonly used 360-degree image datasets for both indoor and outdoor settings demonstrate the state-of-the-art performance of our proposed method.
arXiv Detail & Related papers (2023-09-07T03:22:59Z) - NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes [59.15910989235392]
We introduce NeO 360, Neural fields for sparse view synthesis of outdoor scenes.
NeO 360 is a generalizable method that reconstructs 360deg scenes from a single or a few posed RGB images.
Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations.
arXiv Detail & Related papers (2023-08-24T17:59:50Z) - NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with
360{\deg} Views [77.93662205673297]
In this work, we study the challenging task of lifting a single image to a 3D object.
We demonstrate the ability to generate a plausible 3D object with 360deg views that correspond well with a given reference image.
We propose a novel framework, dubbed NeuralLift-360, that utilizes a depth-aware radiance representation.
arXiv Detail & Related papers (2022-11-29T17:59:06Z) - Immersive Neural Graphics Primitives [13.48024951446282]
We present and evaluate a NeRF-based framework that is capable of rendering scenes in immersive VR.
Our approach can yield a frame rate of 30 frames per second with a resolution of 1280x720 pixels per eye.
arXiv Detail & Related papers (2022-11-24T09:33:38Z) - MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere
Images [26.899767088485184]
We introduce a method to convert stereo 360deg (omnidirectional stereo) imagery into a layered, multi-sphere image representation for 6DoF rendering.
This significantly improves comfort for the viewer, and can be inferred and rendered in real time on modern GPU hardware.
arXiv Detail & Related papers (2020-08-14T18:33:05Z) - Perceptual Quality Assessment of Omnidirectional Images as Moving Camera
Videos [49.217528156417906]
Two types of VR viewing conditions are crucial in determining the viewing behaviors of users and the perceived quality of the panorama.
We first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions.
We then leverage advanced 2D full-reference video quality models to compute the perceived quality.
arXiv Detail & Related papers (2020-05-21T10:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.