DORSal: Diffusion for Object-centric Representations of Scenes et al
- URL: http://arxiv.org/abs/2306.08068v3
- Date: Fri, 3 May 2024 00:37:12 GMT
- Title: DORSal: Diffusion for Object-centric Representations of Scenes et al
- Authors: Allan Jabri, Sjoerd van Steenkiste, Emiel Hoogeboom, Mehdi S. M. Sajjadi, Thomas Kipf,
- Abstract summary: Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes.
We propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on frozen object-centric slot-based representations of scenes.
- Score: 28.181157214966493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of input images, and controllable scene generation that supports editing, is now possible. However, training jointly on a large number of scenes typically compromises rendering quality when compared to single-scene optimized models such as NeRFs. In this paper, we leverage recent progress in diffusion models to equip 3D scene representation learning models with the ability to render high-fidelity novel views, while retaining benefits such as object-level scene editing to a large degree. In particular, we propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on frozen object-centric slot-based representations of scenes. On both complex synthetic multi-object scenes and on the real-world large-scale Street View dataset, we show that DORSal enables scalable neural rendering of 3D scenes with object-level editing and improves upon existing approaches.
Related papers
- MegaScenes: Scene-Level View Synthesis at Scale [69.21293001231993]
Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications.
We create a large-scale scene-level dataset from Internet photo collections, called MegaScenes, which contains over 100K structure from motion (SfM) reconstructions from around the world.
We analyze failure cases of state-of-the-art NVS methods and significantly improve generation consistency.
arXiv Detail & Related papers (2024-06-17T17:55:55Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - GAUDI: A Neural Architect for Immersive 3D Scene Generation [67.97817314857917]
GAUDI is a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets.
arXiv Detail & Related papers (2022-07-27T19:10:32Z) - Control-NeRF: Editable Feature Volumes for Scene Rendering and
Manipulation [58.16911861917018]
We present a novel method for performing flexible, 3D-aware image content manipulation while enabling high-quality novel view synthesis.
Our model couples learnt scene-specific feature volumes with a scene agnostic neural rendering network.
We demonstrate various scene manipulations, including mixing scenes, deforming objects and inserting objects into scenes, while still producing photo-realistic results.
arXiv Detail & Related papers (2022-04-22T17:57:00Z) - Unconstrained Scene Generation with Locally Conditioned Radiance Fields [24.036609880683585]
We introduce Generative Scene Networks (GSN), which learns to decompose scenes into a collection of local radiance fields.
Our model can be used as a prior to generate new scenes, or to complete a scene given only sparse 2D observations.
arXiv Detail & Related papers (2021-04-01T17:58:26Z) - Equivariant Neural Rendering [22.95150913645939]
We propose a framework for learning neural scene representations directly from images, without 3D supervision.
Our key insight is that 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene.
Our formulation allows us to infer and render scenes in real time while achieving comparable results to models requiring minutes for inference.
arXiv Detail & Related papers (2020-06-13T12:25:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.