UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative
Neural Feature Fields
- URL: http://arxiv.org/abs/2303.14167v2
- Date: Tue, 28 Mar 2023 02:00:39 GMT
- Title: UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative
Neural Feature Fields
- Authors: Yuanbo Yang, Yifei Yang, Hanlei Guo, Rong Xiong, Yue Wang, Yiyi Liao
- Abstract summary: We propose UrbanGIRAFFE, which uses a coarse 3D panoptic prior to guide a 3D-aware generative model.
Our model is compositional and controllable as it breaks down the scene into stuff, objects, and sky.
With proper loss functions, our approach facilitates photorealistic 3D-aware image synthesis with diverse controllability.
- Score: 22.180286908121946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating photorealistic images with controllable camera pose and scene
contents is essential for many applications including AR/VR and simulation.
Despite the fact that rapid progress has been made in 3D-aware generative
models, most existing methods focus on object-centric images and are not
applicable to generating urban scenes for free camera viewpoint control and
scene editing. To address this challenging task, we propose UrbanGIRAFFE, which
uses a coarse 3D panoptic prior, including the layout distribution of
uncountable stuff and countable objects, to guide a 3D-aware generative model.
Our model is compositional and controllable as it breaks down the scene into
stuff, objects, and sky. Using stuff prior in the form of semantic voxel grids,
we build a conditioned stuff generator that effectively incorporates the coarse
semantic and geometry information. The object layout prior further allows us to
learn an object generator from cluttered scenes. With proper loss functions,
our approach facilitates photorealistic 3D-aware image synthesis with diverse
controllability, including large camera movement, stuff editing, and object
manipulation. We validate the effectiveness of our model on both synthetic and
real-world datasets, including the challenging KITTI-360 dataset.
Related papers
- Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting [47.014044892025346]
Architect is a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting.
Our pipeline is further extended to a hierarchical and iterative inpainting process to continuously generate placement of large furniture and small objects to enrich the scene.
arXiv Detail & Related papers (2024-11-14T22:15:48Z) - 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting [100.94916668527544]
Existing methods solely focus on either 2D individual object or 3D global scene editing.
We propose 3DitScene, a novel and unified scene editing framework.
It enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects.
arXiv Detail & Related papers (2024-05-28T17:59:01Z) - Zero-Shot Multi-Object Scene Completion [59.325611678171974]
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image.
Our method outperforms the current state-of-the-art on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-21T17:59:59Z) - Urban Scene Diffusion through Semantic Occupancy Map [49.20779809250597]
UrbanDiffusion is a 3D diffusion model conditioned on a Bird's-Eye View (BEV) map.
Our model learns the data distribution of scene-level structures within a latent space.
After training on real-world driving datasets, our model can generate a wide range of diverse urban scenes.
arXiv Detail & Related papers (2024-03-18T11:54:35Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts.
Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - GIRAFFE: Representing Scenes as Compositional Generative Neural Feature
Fields [45.21191307444531]
Deep generative models allow for photorealistic image synthesis at high resolutions.
But for many applications, this is not enough: content creation also needs to be controllable.
Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis.
arXiv Detail & Related papers (2020-11-24T14:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.