Integrating Prior Observations for Incremental 3D Scene Graph Prediction
- URL: http://arxiv.org/abs/2509.11895v1
- Date: Mon, 15 Sep 2025 13:10:34 GMT
- Title: Integrating Prior Observations for Incremental 3D Scene Graph Prediction
- Authors: Marian Renz, Felix Igelbrink, Martin Atzmueller,
- Abstract summary: 3D semantic scene graphs (3DSSG) provide compact structured representations of environments by explicitly modeling objects, attributes, and relationships.<n>This paper introduces a novel graph model for incremental 3DSSG prediction that integrates additional, multi-modal information, such as prior observations, directly into the message-passing process.<n>We evaluate our approach on the 3DSSG dataset, showing that GNNs enriched with multi-modal information such as semantic embeddings (e.g., CLIP) and prior observations offer a scalable and generalizable solution for complex, real-world environments.
- Score: 0.866627581195388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D semantic scene graphs (3DSSG) provide compact structured representations of environments by explicitly modeling objects, attributes, and relationships. While 3DSSGs have shown promise in robotics and embodied AI, many existing methods rely mainly on sensor data, not integrating further information from semantically rich environments. Additionally, most methods assume access to complete scene reconstructions, limiting their applicability in real-world, incremental settings. This paper introduces a novel heterogeneous graph model for incremental 3DSSG prediction that integrates additional, multi-modal information, such as prior observations, directly into the message-passing process. Utilizing multiple layers, the model flexibly incorporates global and local scene representations without requiring specialized modules or full scene reconstructions. We evaluate our approach on the 3DSSG dataset, showing that GNNs enriched with multi-modal information such as semantic embeddings (e.g., CLIP) and prior observations offer a scalable and generalizable solution for complex, real-world environments. The full source code of the presented architecture will be made available at https://github.com/m4renz/incremental-scene-graph-prediction.
Related papers
- SGR3 Model: Scene Graph Retrieval-Reasoning Model in 3D [51.32219731589742]
3D scene graphs provide a structured representation of object entities and their relationships.<n>Existing approaches for 3D scene graph generation typically combine scene reconstruction with graph neural networks (GNNs)<n>In this work, we introduce a Scene Graph Retrieval-Reasoning Model in 3D (SGR3 Model)
arXiv Detail & Related papers (2026-03-04T21:19:54Z) - MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments [6.071490877668865]
We introduce Multi-Agent 3D Scene Graph Generation (MA3DSG) model, the first framework designed to tackle this scalability challenge using multiple agents.<n>We develop a training-free graph alignment algorithm that efficiently merges partial graphs from individual agents into a unified global scene graph.
arXiv Detail & Related papers (2026-02-04T02:39:57Z) - KeySG: Hierarchical Keyframe-Based 3D Scene Graphs [1.5134439544218246]
KeySG represents 3D scenes as a hierarchical graph consisting of floors, rooms, objects, and functional elements.<n>We leverage VLM to extract scene information, alleviating the need to explicitly model relationship edges between objects.<n>Our approach can process complex and ambiguous queries while mitigating the scalability issues associated with large scene graphs.
arXiv Detail & Related papers (2025-10-01T15:53:27Z) - Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph [0.0]
OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph.<n>The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects.<n>Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods.
arXiv Detail & Related papers (2025-07-16T10:47:12Z) - Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z) - GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding [20.578106363482018]
We propose a novel framework that enhances 3DGS-based scene understanding by integrating semantic clustering and scene graph generation.<n>We introduce a "Control-Follow" clustering strategy, which dynamically adapts to scene scale and feature distribution, avoiding feature compression.<n>We enrich scene representation by integrating object attributes and spatial relations extracted from 2D foundation models.
arXiv Detail & Related papers (2025-03-06T02:36:59Z) - Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - Incremental 3D Semantic Scene Graph Prediction from RGB Sequences [86.77318031029404]
We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence.
Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.
The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
arXiv Detail & Related papers (2023-05-04T11:32:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.