Diffusion-based Generation, Optimization, and Planning in 3D Scenes
- URL: http://arxiv.org/abs/2301.06015v1
- Date: Sun, 15 Jan 2023 03:43:45 GMT
- Title: Diffusion-based Generation, Optimization, and Planning in 3D Scenes
- Authors: Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu,
Wei Liang, Song-Chun Zhu
- Abstract summary: We introduce SceneDiffuser, a conditional generative model for 3D scene understanding.
SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented.
We show significant improvements compared with previous models.
- Score: 89.63179422011254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SceneDiffuser, a conditional generative model for 3D scene
understanding. SceneDiffuser provides a unified model for solving
scene-conditioned generation, optimization, and planning. In contrast to prior
works, SceneDiffuser is intrinsically scene-aware, physics-based, and
goal-oriented. With an iterative sampling strategy, SceneDiffuser jointly
formulates the scene-aware generation, physics-based optimization, and
goal-oriented planning via a diffusion-based denoising process in a fully
differentiable fashion. Such a design alleviates the discrepancies among
different modules and the posterior collapse of previous scene-conditioned
generative models. We evaluate SceneDiffuser with various 3D scene
understanding tasks, including human pose and motion generation, dexterous
grasp generation, path planning for 3D navigation, and motion planning for
robot arms. The results show significant improvements compared with previous
models, demonstrating the tremendous potential of SceneDiffuser for the broad
community of 3D scene understanding.
Related papers
- Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling [23.06464506261766]
We present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions.
Our approach involves a 3D Gaussian Guide for scene representation, consisting of semantic primitives (objects) and their spatial transformations.
A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene.
arXiv Detail & Related papers (2024-04-14T12:13:07Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - DORSal: Diffusion for Object-centric Representations of Scenes et al [28.181157214966493]
Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes.
We propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on frozen object-centric slot-based representations of scenes.
arXiv Detail & Related papers (2023-06-13T18:32:35Z) - DiffDreamer: Towards Consistent Unsupervised Single-view Scene
Extrapolation with Conditional Diffusion Models [91.94566873400277]
DiffDreamer is an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory.
We show that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving consistency significantly better than prior GAN-based methods.
arXiv Detail & Related papers (2022-11-22T10:06:29Z) - GAUDI: A Neural Architect for Immersive 3D Scene Generation [67.97817314857917]
GAUDI is a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets.
arXiv Detail & Related papers (2022-07-27T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.