DiffuScene: Denoising Diffusion Models for Generative Indoor Scene
Synthesis
- URL: http://arxiv.org/abs/2303.14207v2
- Date: Tue, 12 Mar 2024 15:40:08 GMT
- Title: DiffuScene: Denoising Diffusion Models for Generative Indoor Scene
Synthesis
- Authors: Jiapeng Tang, Yinyu Nie, Lev Markhasin, Angela Dai, Justus Thies,
Matthias Nie{\ss}ner
- Abstract summary: We present DiffuScene for indoor 3D scene synthesis based on a novel scene configuration denoising diffusion model.
It generates 3D instance properties stored in an unordered object set and retrieves the most similar geometry for each object configuration.
- Score: 44.521452102413534
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present DiffuScene for indoor 3D scene synthesis based on a novel scene
configuration denoising diffusion model. It generates 3D instance properties
stored in an unordered object set and retrieves the most similar geometry for
each object configuration, which is characterized as a concatenation of
different attributes, including location, size, orientation, semantics, and
geometry features. We introduce a diffusion network to synthesize a collection
of 3D indoor objects by denoising a set of unordered object attributes.
Unordered parametrization simplifies and eases the joint distribution
approximation. The shape feature diffusion facilitates natural object
placements, including symmetries. Our method enables many downstream
applications, including scene completion, scene arrangement, and
text-conditioned scene synthesis. Experiments on the 3D-FRONT dataset show that
our method can synthesize more physically plausible and diverse indoor scenes
than state-of-the-art methods. Extensive ablation studies verify the
effectiveness of our design choice in scene diffusion models.
Related papers
- Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models [57.37244894146089]
We propose Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding tasks.
We show that it outperforms competitive baselines and achieves significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-07-18T16:20:56Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling [23.06464506261766]
We present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions.
Our approach involves a 3D Gaussian Guide for scene representation, consisting of semantic primitives (objects) and their spatial transformations.
A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene.
arXiv Detail & Related papers (2024-04-14T12:13:07Z) - HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data [42.49031063635004]
We propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data.
Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis.
We adopt the generated 3D data for learning 6D object pose estimation and show its effectiveness in improving perception systems.
arXiv Detail & Related papers (2024-03-18T17:48:31Z) - InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with
Semantic Graph Prior [27.773451301040424]
InstructScene is a novel generative framework that integrates a semantic graph prior and a layout decoder.
We show that the proposed method surpasses existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2024-02-07T10:09:00Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image.
Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume.
In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.