SinGRAV: Learning a Generative Radiance Volume from a Single Natural
Scene
- URL: http://arxiv.org/abs/2210.01202v1
- Date: Mon, 3 Oct 2022 19:38:14 GMT
- Title: SinGRAV: Learning a Generative Radiance Volume from a Single Natural
Scene
- Authors: Yujie Wang, Xuelin Chen, Baoquan Chen
- Abstract summary: We present a 3D generative model for general natural scenes. Lacking necessary volumes of 3D data characterizing the target scene, we propose to learn from a single scene.
We exploit a multi-scale convolutional network, which possesses the spatial locality bias in nature, to learn from the statistics of local regions at multiple scales within a single scene.
- Score: 42.24260323525382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a 3D generative model for general natural scenes. Lacking
necessary volumes of 3D data characterizing the target scene, we propose to
learn from a single scene. Our key insight is that a natural scene often
contains multiple constituents whose geometry, texture, and spatial
arrangements follow some clear patterns, but still exhibit rich variations over
different regions within the same scene. This suggests localizing the learning
of a generative model on substantial local regions. Hence, we exploit a
multi-scale convolutional network, which possesses the spatial locality bias in
nature, to learn from the statistics of local regions at multiple scales within
a single scene. In contrast to existing methods, our learning setup bypasses
the need to collect data from many homogeneous 3D scenes for learning common
features. We coin our method SinGRAV, for learning a Generative RAdiance Volume
from a Single natural scene. We demonstrate the ability of SinGRAV in
generating plausible and diverse variations from a single scene, the merits of
SinGRAV over state-of-the-art generative neural scene methods, as well as the
versatility of SinGRAV by its use in a variety of applications, spanning 3D
scene editing, composition, and animation. Code and data will be released to
facilitate further research.
Related papers
- Learning 3D Scene Analogies with Neural Contextual Scene Maps [17.545689536966265]
We propose teaching machines to identify relational commonalities in 3D spaces.
Instead of focusing on point-wise or object-wise representations, we introduce 3D scene analogies.
arXiv Detail & Related papers (2025-03-20T06:49:33Z) - Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement [32.335953514942474]
This paper proposes to jointly learn the scene representation along with a 3D dense feature field and a 2D feature extractor.
We learn the underlying geometry of the scene with an implicit field through volumetric rendering and design our feature field to leverage intermediate geometric information encoded in the implicit field.
Visual localization is then achieved by aligning the image-based features and the rendered volumetric features.
arXiv Detail & Related papers (2024-06-12T17:51:53Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language [31.691159120136064]
We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data.
We present a novel method, dubbed WildRefer, for this task by fully utilizing the rich appearance information in images, the position and geometric clues in point cloud.
Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots.
arXiv Detail & Related papers (2023-04-12T06:48:26Z) - SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene [40.705096946588]
We introduce SinGRAF, a 3D-aware generative model that is trained with a few input images of a single scene.
It generates different realizations of this 3D scene that preserve the appearance of the input while varying scene layout.
With several experiments, we demonstrate that the results produced by SinGRAF outperform the closest related works in both quality and diversity by a large margin.
arXiv Detail & Related papers (2022-11-30T18:55:27Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Zero-Shot Text-Guided Object Generation with Dream Fields [111.06026544180398]
We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects.
Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision.
In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.
arXiv Detail & Related papers (2021-12-02T17:53:55Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.