InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with
Semantic Graph Prior
- URL: http://arxiv.org/abs/2402.04717v1
- Date: Wed, 7 Feb 2024 10:09:00 GMT
- Title: InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with
Semantic Graph Prior
- Authors: Chenguo Lin, Yadong Mu
- Abstract summary: InstructScene is a novel generative framework that integrates a semantic graph prior and a layout decoder.
We show that the proposed method surpasses existing state-of-the-art approaches by a large margin.
- Score: 27.773451301040424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Comprehending natural language instructions is a charming property for 3D
indoor scene synthesis systems. Existing methods directly model object joint
distributions and express object relations implicitly within a scene, thereby
hindering the controllability of generation. We introduce InstructScene, a
novel generative framework that integrates a semantic graph prior and a layout
decoder to improve controllability and fidelity for 3D scene synthesis. The
proposed semantic graph prior jointly learns scene appearances and layout
distributions, exhibiting versatility across various downstream tasks in a
zero-shot manner. To facilitate the benchmarking for text-driven 3D scene
synthesis, we curate a high-quality dataset of scene-instruction pairs with
large language and multimodal models. Extensive experimental results reveal
that the proposed method surpasses existing state-of-the-art approaches by a
large margin. Thorough ablation studies confirm the efficacy of crucial design
components. Project page: https://chenguolin.github.io/projects/InstructScene.
Related papers
- InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior [23.536285325566013]
Comprehending natural language instructions is a charming property for both 2D and 3D layout synthesis systems.
Existing methods implicitly model object joint distributions and express object relations, hindering generation's controllability synthesis systems.
We introduce Instruct, a novel generative framework that integrates a semantic graph prior and a layout decoder.
arXiv Detail & Related papers (2024-07-10T12:13:39Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling [23.06464506261766]
We present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions.
Our approach involves a 3D Gaussian Guide for scene representation, consisting of semantic primitives (objects) and their spatial transformations.
A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene.
arXiv Detail & Related papers (2024-04-14T12:13:07Z) - Planner3D: LLM-enhanced graph prior meets 3D indoor scene explicit regularization [31.52569918586902]
3D scene synthesis has diverse applications across a spectrum of industries such as robotics, films, and video games.
In this paper, we aim at generating realistic and reasonable 3D indoor scenes from scene graph.
Our method achieves better 3D scene synthesis, especially in terms of scene-level fidelity.
arXiv Detail & Related papers (2024-03-19T15:54:48Z) - GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs.
By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model.
We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z) - Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
Supervised 3D Visual Grounding [58.924180772480504]
3D visual grounding involves finding a target object in a 3D scene that corresponds to a given sentence query.
We propose to leverage weakly supervised annotations to learn the 3D visual grounding model.
We design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.
arXiv Detail & Related papers (2023-07-18T13:49:49Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - Incremental 3D Semantic Scene Graph Prediction from RGB Sequences [86.77318031029404]
We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence.
Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.
The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
arXiv Detail & Related papers (2023-05-04T11:32:16Z) - Compositional 3D Scene Generation using Locally Conditioned Diffusion [49.5784841881488]
We introduce textbflocally conditioned diffusion as an approach to compositional scene diffusion.
We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.
arXiv Detail & Related papers (2023-03-21T22:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.