Related papers: InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

URL: http://arxiv.org/abs/2402.04717v1
Date: Wed, 7 Feb 2024 10:09:00 GMT
Title: InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior
Authors: Chenguo Lin, Yadong Mu
Abstract summary: InstructScene is a novel generative framework that integrates a semantic graph prior and a layout decoder. We show that the proposed method surpasses existing state-of-the-art approaches by a large margin.
Score: 27.773451301040424
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Comprehending natural language instructions is a charming property for 3D indoor scene synthesis systems. Existing methods directly model object joint distributions and express object relations implicitly within a scene, thereby hindering the controllability of generation. We introduce InstructScene, a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 3D scene synthesis. The proposed semantic graph prior jointly learns scene appearances and layout distributions, exhibiting versatility across various downstream tasks in a zero-shot manner. To facilitate the benchmarking for text-driven 3D scene synthesis, we curate a high-quality dataset of scene-instruction pairs with large language and multimodal models. Extensive experimental results reveal that the proposed method surpasses existing state-of-the-art approaches by a large margin. Thorough ablation studies confirm the efficacy of crucial design components. Project page: https://chenguolin.github.io/projects/InstructScene.

Related papers

Functional 3D Scene Synthesis through Human-Scene Optimization [30.910671968876024]
Our approach is based on a simple, but effective principle: we condition scene synthesis to generate rooms that are usable by humans. If this human-centric scene generation is viable, the room layout is functional and it leads to a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-05T04:00:24Z)
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation [84.32038395034868]
OccScene integrates fine-grained 3D perception and high-quality generation in a unified framework. OccScene generates new and consistent 3D realistic scenes only depending on text prompts. Experiments show that OccScene achieves realistic 3D scene generation in broad indoor and outdoor scenarios.
arXiv Detail & Related papers (2024-12-15T13:26:51Z)
InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior [23.536285325566013]
Comprehending natural language instructions is a charming property for both 2D and 3D layout synthesis systems. Existing methods implicitly model object joint distributions and express object relations, hindering generation's controllability synthesis systems. We introduce Instruct, a novel generative framework that integrates a semantic graph prior and a layout decoder.
arXiv Detail & Related papers (2024-07-10T12:13:39Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling [23.06464506261766]
We present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions. Our approach involves a 3D Gaussian Guide for scene representation, consisting of semantic primitives (objects) and their spatial transformations. A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene.
arXiv Detail & Related papers (2024-04-14T12:13:07Z)
Planner3D: LLM-enhanced graph prior meets 3D indoor scene explicit regularization [31.52569918586902]
3D scene synthesis has diverse applications across a spectrum of industries such as robotics, films, and video games. In this paper, we aim at generating realistic and reasonable 3D indoor scenes from scene graph. Our method achieves better 3D scene synthesis, especially in terms of scene-level fidelity.
arXiv Detail & Related papers (2024-03-19T15:54:48Z)
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z)
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding [58.924180772480504]
3D visual grounding involves finding a target object in a 3D scene that corresponds to a given sentence query. We propose to leverage weakly supervised annotations to learn the 3D visual grounding model. We design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.
arXiv Detail & Related papers (2023-07-18T13:49:49Z)
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z)
Incremental 3D Semantic Scene Graph Prediction from RGB Sequences [86.77318031029404]
We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence. Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network. The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
arXiv Detail & Related papers (2023-05-04T11:32:16Z)
Compositional 3D Scene Generation using Locally Conditioned Diffusion [49.5784841881488]
We introduce textbflocally conditioned diffusion as an approach to compositional scene diffusion. We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.
arXiv Detail & Related papers (2023-03-21T22:37:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.