KeySG: Hierarchical Keyframe-Based 3D Scene Graphs
- URL: http://arxiv.org/abs/2510.01049v1
- Date: Wed, 01 Oct 2025 15:53:27 GMT
- Title: KeySG: Hierarchical Keyframe-Based 3D Scene Graphs
- Authors: Abdelrhman Werby, Dennis Rotondi, Fabio Scaparro, Kai O. Arras,
- Abstract summary: KeySG represents 3D scenes as a hierarchical graph consisting of floors, rooms, objects, and functional elements.<n>We leverage VLM to extract scene information, alleviating the need to explicitly model relationship edges between objects.<n>Our approach can process complex and ambiguous queries while mitigating the scalability issues associated with large scene graphs.
- Score: 1.5134439544218246
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, 3D scene graphs have emerged as a powerful world representation, offering both geometric accuracy and semantic richness. Combining 3D scene graphs with large language models enables robots to reason, plan, and navigate in complex human-centered environments. However, current approaches for constructing 3D scene graphs are semantically limited to a predefined set of relationships, and their serialization in large environments can easily exceed an LLM's context window. We introduce KeySG, a framework that represents 3D scenes as a hierarchical graph consisting of floors, rooms, objects, and functional elements, where nodes are augmented with multi-modal information extracted from keyframes selected to optimize geometric and visual coverage. The keyframes allow us to efficiently leverage VLM to extract scene information, alleviating the need to explicitly model relationship edges between objects, enabling more general, task-agnostic reasoning and planning. Our approach can process complex and ambiguous queries while mitigating the scalability issues associated with large scene graphs by utilizing a hierarchical retrieval-augmented generation (RAG) pipeline to extract relevant context from the graph. Evaluated across four distinct benchmarks -- including 3D object segmentation and complex query retrieval -- KeySG outperforms prior approaches on most metrics, demonstrating its superior semantic richness and efficiency.
Related papers
- SGR3 Model: Scene Graph Retrieval-Reasoning Model in 3D [51.32219731589742]
3D scene graphs provide a structured representation of object entities and their relationships.<n>Existing approaches for 3D scene graph generation typically combine scene reconstruction with graph neural networks (GNNs)<n>In this work, we introduce a Scene Graph Retrieval-Reasoning Model in 3D (SGR3 Model)
arXiv Detail & Related papers (2026-03-04T21:19:54Z) - MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments [6.071490877668865]
We introduce Multi-Agent 3D Scene Graph Generation (MA3DSG) model, the first framework designed to tackle this scalability challenge using multiple agents.<n>We develop a training-free graph alignment algorithm that efficiently merges partial graphs from individual agents into a unified global scene graph.
arXiv Detail & Related papers (2026-02-04T02:39:57Z) - Open-World 3D Scene Graph Generation for Retrieval-Augmented Reasoning [24.17324180628543]
We propose a unified framework for Open-World 3D Scene Graph Generation with Retrieval-Augmented Reasoning.<n>Our method integrates Vision-Language Models (VLMs) with retrieval-based reasoning to support multimodal exploration and language-guided interaction.<n>We evaluate our method on 3DSSG and Replica benchmarks across four tasks-scene question answering, visual grounding, instance retrieval, and task planning-demonstrating robust generalization and superior performance in diverse environments.
arXiv Detail & Related papers (2025-11-08T07:37:29Z) - Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph [0.0]
OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph.<n>The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects.<n>Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods.
arXiv Detail & Related papers (2025-07-16T10:47:12Z) - Agentic 3D Scene Generation with Spatially Contextualized VLMs [67.31920821192323]
We introduce a new paradigm that enables vision-language models to generate, understand, and edit complex 3D environments.<n>We develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context.<n>Results show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work.
arXiv Detail & Related papers (2025-05-26T15:28:17Z) - Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - Multiview Scene Graph [7.460438046915524]
A proper scene representation is central to the pursuit of spatial intelligence.
We propose to build Multiview Scene Graphs (MSG) from unposed images.
MSG represents a scene topologically with interconnected place and object nodes.
arXiv Detail & Related papers (2024-10-15T02:04:05Z) - ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning [125.90002884194838]
ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes.
It is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association.
We demonstrate the utility of this representation through a number of downstream planning tasks.
arXiv Detail & Related papers (2023-09-28T17:53:38Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.