Related papers: Incremental 3D Semantic Scene Graph Prediction from RGB Sequences

Incremental 3D Semantic Scene Graph Prediction from RGB Sequences

URL: http://arxiv.org/abs/2305.02743v2
Date: Sat, 6 May 2023 20:15:08 GMT
Title: Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
Authors: Shun-Cheng Wu, Keisuke Tateno, Nassir Navab, Federico Tombari
Abstract summary: We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence. Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network. The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
Score: 86.77318031029404
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: 3D semantic scene graphs are a powerful holistic representation as they describe the individual objects and depict the relation between them. They are compact high-level graphs that enable many tasks requiring scene reasoning. In real-world settings, existing 3D estimation methods produce robust predictions that mostly rely on dense inputs. In this work, we propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence. Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network. The proposed pipeline simultaneously reconstructs a sparse point map and fuses entity estimation from the input images. The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities. Extensive experiments on the 3RScan dataset show the effectiveness of the proposed method in this challenging task, outperforming state-of-the-art approaches.

Related papers

Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting [5.8452477457633485]
It is observed existing methods have various limitations, such as requiring precise camera poses for input and dense viewpoints for supervision. We propose a novel graph-guided 3D scene reconstruction framework, GraphGS. We demonstrate GraphGS achieves high-fidelity 3D reconstruction from images, which presents state-of-the-art performance through quantitative and qualitative evaluation across multiple datasets.
arXiv Detail & Related papers (2025-02-24T17:59:08Z)
Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding. An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z)
TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding [8.32401190051443]
We present the first implementation of an Equivariant Scene Graph Neural Network (ESGNN) to generate semantic scene graphs from 3D point clouds. Our combined architecture, termed the Temporal Equivariant Scene Graph Neural Network (TESGNN), not only surpasses existing state-of-the-art methods in scene estimation accuracy but also achieves faster convergence.
arXiv Detail & Related papers (2024-11-15T15:39:04Z)
ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding [2.5165775267615205]
This work is the first to implement an Equivariant Graph Neural Network in semantic scene graph generation from 3D point clouds for scene understanding. Our proposed method, ESGNN, outperforms existing state-of-the-art approaches, demonstrating a significant improvement in scene estimation with faster convergence.
arXiv Detail & Related papers (2024-06-30T06:58:04Z)
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning [125.90002884194838]
ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes. It is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association. We demonstrate the utility of this representation through a number of downstream planning tasks.
arXiv Detail & Related papers (2023-09-28T17:53:38Z)
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences [76.28527350263012]
We propose a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames. We aggregate PointNet features from primitive scene components by means of a graph neural network. Our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.
arXiv Detail & Related papers (2021-03-27T13:00:36Z)
SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner. Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph. We propose a learned method that regresses a scene graph from the point cloud of a scene. We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.