Related papers: GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis

GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis

URL: http://arxiv.org/abs/2511.14884v1
Date: Tue, 18 Nov 2025 20:06:49 GMT
Title: GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis
Authors: Antonio Ruiz, Tao Wu, Andrew Melnik, Qing Cheng, Xuqin Wang, Lu Liu, Yongliang Wang, Yanfeng Zhang, Helge Ritter,
Abstract summary: Methods that synthesize indoor 3D scenes from text prompts have wide-ranging applications in film production, interior design, video games, virtual reality, and synthetic data generation for training embodied agents.<n>Existing approaches typically either train generative models from scratch or leverage vision-language models (VLMs)<n>We introduce GeoSceneGraph, a method that synthesizes 3D scenes from text prompts by leveraging the graph structure and geometric symmetries of 3D scenes.
Score: 14.137982018879049
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Methods that synthesize indoor 3D scenes from text prompts have wide-ranging applications in film production, interior design, video games, virtual reality, and synthetic data generation for training embodied agents. Existing approaches typically either train generative models from scratch or leverage vision-language models (VLMs). While VLMs achieve strong performance, particularly for complex or open-ended prompts, smaller task-specific models remain necessary for deployment on resource-constrained devices such as extended reality (XR) glasses or mobile phones. However, many generative approaches that train from scratch overlook the inherent graph structure of indoor scenes, which can limit scene coherence and realism. Conversely, methods that incorporate scene graphs either demand a user-provided semantic graph, which is generally inconvenient and restrictive, or rely on ground-truth relationship annotations, limiting their capacity to capture more varied object interactions. To address these challenges, we introduce GeoSceneGraph, a method that synthesizes 3D scenes from text prompts by leveraging the graph structure and geometric symmetries of 3D scenes, without relying on predefined relationship classes. Despite not using ground-truth relationships, GeoSceneGraph achieves performance comparable to methods that do. Our model is built on equivariant graph neural networks (EGNNs), but existing EGNN approaches are typically limited to low-dimensional conditioning and are not designed to handle complex modalities such as text. We propose a simple and effective strategy for conditioning EGNNs on text features, and we validate our design through ablation studies.

Related papers

SGR3 Model: Scene Graph Retrieval-Reasoning Model in 3D [51.32219731589742]
3D scene graphs provide a structured representation of object entities and their relationships.<n>Existing approaches for 3D scene graph generation typically combine scene reconstruction with graph neural networks (GNNs)<n>In this work, we introduce a Scene Graph Retrieval-Reasoning Model in 3D (SGR3 Model)
arXiv Detail & Related papers (2026-03-04T21:19:54Z)
SceneLinker: Compositional 3D Scene Generation via Semantic Scene Graph from RGB Sequences [12.771171646896468]
We introduce SceneLinker, a framework that generates compositional 3D scenes via semantic scene graph from RGB sequences.<n>Our work enables users to generate consistent 3D spaces from their physical environments via scene graphs, allowing them to create spatial Mixed Reality (MR) content.
arXiv Detail & Related papers (2026-02-03T01:22:07Z)
RoamScene3D: Immersive Text-to-3D Scene Generation via Adaptive Object-aware Roaming [79.81527946524098]
RoamScene3D is a novel framework that bridges the gap between semantic guidance and spatial generation.<n>We employ a vision-language model (VLM) to construct a scene graph that encodes object relations.<n>To mitigate the limitations of static 2D priors, we introduce a Motion-Injected Inpainting model that is fine-tuned on a synthetic panoramic dataset.
arXiv Detail & Related papers (2026-01-27T10:10:55Z)
MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation [14.959772906099039]
MMGDreamer is a dual-branch diffusion model for scene generation that incorporates a novel Mixed-Modality Graph.<n>Visual enhancement module enriches the visual fidelity of text-only nodes by constructing visual representations using text embeddings.<n>Our relation predictor leverages node representations to infer absent relationships between nodes, resulting in more coherent scene layouts.
arXiv Detail & Related papers (2025-02-09T12:23:40Z)
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z)
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z)
Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs [85.54212143154986]
Controllable scene synthesis consists of generating 3D information that satisfy underlying specifications. Scene graphs are representations of a scene composed of objects (nodes) and inter-object relationships (edges) We propose the first work that directly generates shapes from a scene graph in an end-to-end manner.
arXiv Detail & Related papers (2021-08-19T17:59:07Z)
Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs. We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.