Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
- URL: http://arxiv.org/abs/2503.19199v1
- Date: Mon, 24 Mar 2025 22:53:19 GMT
- Title: Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
- Authors: Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang, Ruida Zhang, Xiangyang Ji, Marc Pollefeys, Francis Engelmann,
- Abstract summary: We introduce the task of predicting functional 3D scene graphs for real-world indoor environments from posed RGB-D images.<n>Unlike traditional 3D scene graphs that focus on spatial relationships of objects, functional 3D scene graphs capture objects, interactive elements, and their functional relationships.<n>We evaluate our approach on an extended SceneFun3D dataset and a newly collected dataset, FunGraph3D, both annotated with functional 3D scene graphs.
- Score: 113.91791599146786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the task of predicting functional 3D scene graphs for real-world indoor environments from posed RGB-D images. Unlike traditional 3D scene graphs that focus on spatial relationships of objects, functional 3D scene graphs capture objects, interactive elements, and their functional relationships. Due to the lack of training data, we leverage foundation models, including visual language models (VLMs) and large language models (LLMs), to encode functional knowledge. We evaluate our approach on an extended SceneFun3D dataset and a newly collected dataset, FunGraph3D, both annotated with functional 3D scene graphs. Our method significantly outperforms adapted baselines, including Open3DSG and ConceptGraph, demonstrating its effectiveness in modeling complex scene functionalities. We also demonstrate downstream applications such as 3D question answering and robotic manipulation using functional 3D scene graphs. See our project page at https://openfungraph.github.io
Related papers
- Controllable 3D Outdoor Scene Generation via Scene Graphs [74.40967075159071]
We develop an interactive system that transforms a sparse scene graph into a dense BEV Embedding Map.<n>During inference, users can easily create or modify scene graphs to generate large-scale outdoor scenes.<n> Experimental results show that our approach consistently produces high-quality 3D urban scenes closely aligned with the input scene graphs.
arXiv Detail & Related papers (2025-03-10T10:26:08Z) - 3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding [0.5755004576310334]
A 3D scene graph represents a compact scene model, storing information about the objects and the semantic relationships between them.<n>In this work, we propose a method 3DGraphLLM for constructing a learnable representation of a 3D scene graph.<n>The learnable representation is used as input for LLMs to perform 3D vision-language tasks.
arXiv Detail & Related papers (2024-12-24T14:21:58Z) - Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling [9.440800948514449]
We propose a weakly-supervised 3D scene graph generation method via Visual-Linguistic Assisted Pseudo-labeling.
Our 3D-VLAP exploits the superior ability of current large-scale visual-linguistic models to align the semantics between texts and 2D images.
We design an edge self-attention based graph neural network to generate scene graphs of 3D point cloud scenes.
arXiv Detail & Related papers (2024-04-03T07:30:09Z) - Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships [15.513180297629546]
We present Open3DSG, an alternative approach to learn 3D scene graph prediction in an open world without requiring labeled scene graph data.
We co-embed the features from a 3D scene graph prediction backbone with the feature space of powerful open world 2D vision language foundation models.
arXiv Detail & Related papers (2024-02-19T16:15:03Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - Interactive Annotation of 3D Object Geometry using 2D Scribbles [84.51514043814066]
In this paper, we propose an interactive framework for annotating 3D object geometry from point cloud data and RGB imagery.
Our framework targets naive users without artistic or graphics expertise.
arXiv Detail & Related papers (2020-08-24T21:51:29Z) - Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph.
We propose a learned method that regresses a scene graph from the point cloud of a scene.
We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.