Diffusion-based Generation, Optimization, and Planning in 3D Scenes
- URL: http://arxiv.org/abs/2301.06015v1
- Date: Sun, 15 Jan 2023 03:43:45 GMT
- Title: Diffusion-based Generation, Optimization, and Planning in 3D Scenes
- Authors: Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu,
Wei Liang, Song-Chun Zhu
- Abstract summary: We introduce SceneDiffuser, a conditional generative model for 3D scene understanding.
SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented.
We show significant improvements compared with previous models.
- Score: 89.63179422011254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SceneDiffuser, a conditional generative model for 3D scene
understanding. SceneDiffuser provides a unified model for solving
scene-conditioned generation, optimization, and planning. In contrast to prior
works, SceneDiffuser is intrinsically scene-aware, physics-based, and
goal-oriented. With an iterative sampling strategy, SceneDiffuser jointly
formulates the scene-aware generation, physics-based optimization, and
goal-oriented planning via a diffusion-based denoising process in a fully
differentiable fashion. Such a design alleviates the discrepancies among
different modules and the posterior collapse of previous scene-conditioned
generative models. We evaluate SceneDiffuser with various 3D scene
understanding tasks, including human pose and motion generation, dexterous
grasp generation, path planning for 3D navigation, and motion planning for
robot arms. The results show significant improvements compared with previous
models, demonstrating the tremendous potential of SceneDiffuser for the broad
community of 3D scene understanding.
Related papers
- Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation [54.60804602905519]
We learn an entangled representation, aiming to model layered scene geometry, motion forecasting and novel view synthesis together.
Our approach chooses to disentangle scene geometry from scene motion, via lifting the 2D scene to 3D point clouds.
To model future 3D scene motion, we propose a disentangled two-stage approach that initially forecasts ego-motion and subsequently the residual motion of dynamic objects.
arXiv Detail & Related papers (2024-07-31T08:54:50Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - DORSal: Diffusion for Object-centric Representations of Scenes et al [28.181157214966493]
Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes.
We propose DORSal, which adapts a video diffusion architecture for 3D scene generation conditioned on frozen object-centric slot-based representations of scenes.
arXiv Detail & Related papers (2023-06-13T18:32:35Z) - GAUDI: A Neural Architect for Immersive 3D Scene Generation [67.97817314857917]
GAUDI is a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets.
arXiv Detail & Related papers (2022-07-27T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.