Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single
Semantic Mask
- URL: http://arxiv.org/abs/2302.07224v1
- Date: Tue, 14 Feb 2023 17:57:58 GMT
- Title: Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single
Semantic Mask
- Authors: Shangzhan Zhang, Sida Peng, Tianrun Chen, Linzhan Mou, Haotong Lin,
Kaicheng Yu, Yiyi Liao, Xiaowei Zhou
- Abstract summary: We introduce a novel approach that takes a single semantic mask as input to synthesize multi-view consistent color images of natural scenes.
Our method outperforms baseline methods and produces photorealistic, multi-view consistent videos of a variety of natural scenes.
- Score: 29.38152100352871
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel approach that takes a single semantic mask as input to
synthesize multi-view consistent color images of natural scenes, trained with a
collection of single images from the Internet. Prior works on 3D-aware image
synthesis either require multi-view supervision or learning category-level
prior for specific classes of objects, which can hardly work for natural
scenes. Our key idea to solve this challenging problem is to use a semantic
field as the intermediate representation, which is easier to reconstruct from
an input semantic mask and then translate to a radiance field with the
assistance of off-the-shelf semantic image synthesis models. Experiments show
that our method outperforms baseline methods and produces photorealistic,
multi-view consistent videos of a variety of natural scenes.
Related papers
- Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures [33.463245327698]
We present a novel volumetric prior on human faces that allows for high-fidelity expressive face modeling.
We leverage a 3D Morphable Face Model to synthesize a large training set, rendering each identity with different expressions.
We then train a conditional Neural Radiance Field prior on this synthetic dataset and, at inference time, fine-tune the model on a very sparse set of real images of a single subject.
arXiv Detail & Related papers (2024-10-01T12:24:50Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Zero-Shot Text-Guided Object Generation with Dream Fields [111.06026544180398]
We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects.
Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision.
In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.
arXiv Detail & Related papers (2021-12-02T17:53:55Z) - Realistic Image Synthesis with Configurable 3D Scene Layouts [59.872657806747576]
We propose a novel approach to realistic-looking image synthesis based on a 3D scene layout.
Our approach takes a 3D scene with semantic class labels as input and trains a 3D scene painting network.
With the trained painting network, realistic-looking images for the input 3D scene can be rendered and manipulated.
arXiv Detail & Related papers (2021-08-23T09:44:56Z) - Diversifying Semantic Image Synthesis and Editing via Class- and
Layer-wise VAEs [8.528384027684192]
We propose a class- and layer-wise extension to the variational autoencoder framework that allows flexible control over each object class at the local to global levels.
We demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-06-25T04:12:05Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - Semantic View Synthesis [56.47999473206778]
We tackle a new problem of semantic view synthesis -- generating free-viewpoint rendering of a synthesized scene using a semantic label map as input.
First, we focus on synthesizing the color and depth of the visible surface of the 3D scene.
We then use the synthesized color and depth to impose explicit constraints on the multiple-plane image (MPI) representation prediction process.
arXiv Detail & Related papers (2020-08-24T17:59:46Z) - 3D Photography using Context-aware Layered Depth Inpainting [50.66235795163143]
We propose a method for converting a single RGB-D input image into a 3D photo.
A learning-based inpainting model synthesizes new local color-and-depth content into the occluded region.
The resulting 3D photos can be efficiently rendered with motion parallax.
arXiv Detail & Related papers (2020-04-09T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.