InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with
Semantic Graph Prior
- URL: http://arxiv.org/abs/2402.04717v1
- Date: Wed, 7 Feb 2024 10:09:00 GMT
- Title: InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with
Semantic Graph Prior
- Authors: Chenguo Lin, Yadong Mu
- Abstract summary: InstructScene is a novel generative framework that integrates a semantic graph prior and a layout decoder.
We show that the proposed method surpasses existing state-of-the-art approaches by a large margin.
- Score: 27.773451301040424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Comprehending natural language instructions is a charming property for 3D
indoor scene synthesis systems. Existing methods directly model object joint
distributions and express object relations implicitly within a scene, thereby
hindering the controllability of generation. We introduce InstructScene, a
novel generative framework that integrates a semantic graph prior and a layout
decoder to improve controllability and fidelity for 3D scene synthesis. The
proposed semantic graph prior jointly learns scene appearances and layout
distributions, exhibiting versatility across various downstream tasks in a
zero-shot manner. To facilitate the benchmarking for text-driven 3D scene
synthesis, we curate a high-quality dataset of scene-instruction pairs with
large language and multimodal models. Extensive experimental results reveal
that the proposed method surpasses existing state-of-the-art approaches by a
large margin. Thorough ablation studies confirm the efficacy of crucial design
components. Project page: https://chenguolin.github.io/projects/InstructScene.
Related papers
- Functional 3D Scene Synthesis through Human-Scene Optimization [30.910671968876024]
Our approach is based on a simple, but effective principle: we condition scene synthesis to generate rooms that are usable by humans.
If this human-centric scene generation is viable, the room layout is functional and it leads to a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-05T04:00:24Z) - OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation [84.32038395034868]
OccScene integrates fine-grained 3D perception and high-quality generation in a unified framework.
OccScene generates new and consistent 3D realistic scenes only depending on text prompts.
Experiments show that OccScene achieves realistic 3D scene generation in broad indoor and outdoor scenarios.
arXiv Detail & Related papers (2024-12-15T13:26:51Z) - InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior [23.536285325566013]
Comprehending natural language instructions is a charming property for both 2D and 3D layout synthesis systems.
Existing methods implicitly model object joint distributions and express object relations, hindering generation's controllability synthesis systems.
We introduce Instruct, a novel generative framework that integrates a semantic graph prior and a layout decoder.
arXiv Detail & Related papers (2024-07-10T12:13:39Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Planner3D: LLM-enhanced graph prior meets 3D indoor scene explicit regularization [31.52569918586902]
3D scene synthesis has diverse applications across a spectrum of industries such as robotics, films, and video games.
In this paper, we aim at generating realistic and reasonable 3D indoor scenes from scene graph.
Our method achieves better 3D scene synthesis, especially in terms of scene-level fidelity.
arXiv Detail & Related papers (2024-03-19T15:54:48Z) - GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs.
By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model.
We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - Incremental 3D Semantic Scene Graph Prediction from RGB Sequences [86.77318031029404]
We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence.
Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.
The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
arXiv Detail & Related papers (2023-05-04T11:32:16Z) - Compositional 3D Scene Generation using Locally Conditioned Diffusion [49.5784841881488]
We introduce textbflocally conditioned diffusion as an approach to compositional scene diffusion.
We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.
arXiv Detail & Related papers (2023-03-21T22:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.