Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
- URL: http://arxiv.org/abs/2305.02743v2
- Date: Sat, 6 May 2023 20:15:08 GMT
- Title: Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
- Authors: Shun-Cheng Wu, Keisuke Tateno, Nassir Navab, Federico Tombari
- Abstract summary: We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence.
Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.
The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
- Score: 86.77318031029404
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: 3D semantic scene graphs are a powerful holistic representation as they
describe the individual objects and depict the relation between them. They are
compact high-level graphs that enable many tasks requiring scene reasoning. In
real-world settings, existing 3D estimation methods produce robust predictions
that mostly rely on dense inputs. In this work, we propose a real-time
framework that incrementally builds a consistent 3D semantic scene graph of a
scene given an RGB image sequence. Our method consists of a novel incremental
entity estimation pipeline and a scene graph prediction network. The proposed
pipeline simultaneously reconstructs a sparse point map and fuses entity
estimation from the input images. The proposed network estimates 3D semantic
scene graphs with iterative message passing using multi-view and geometric
features extracted from the scene entities. Extensive experiments on the 3RScan
dataset show the effectiveness of the proposed method in this challenging task,
outperforming state-of-the-art approaches.
Related papers
- Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding [8.32401190051443]
We present the first implementation of an Equivariant Scene Graph Neural Network (ESGNN) to generate semantic scene graphs from 3D point clouds.
Our combined architecture, termed the Temporal Equivariant Scene Graph Neural Network (TESGNN), not only surpasses existing state-of-the-art methods in scene estimation accuracy but also achieves faster convergence.
arXiv Detail & Related papers (2024-11-15T15:39:04Z) - ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding [2.5165775267615205]
This work is the first to implement an Equivariant Graph Neural Network in semantic scene graph generation from 3D point clouds for scene understanding.
Our proposed method, ESGNN, outperforms existing state-of-the-art approaches, demonstrating a significant improvement in scene estimation with faster convergence.
arXiv Detail & Related papers (2024-06-30T06:58:04Z) - ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning [125.90002884194838]
ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes.
It is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association.
We demonstrate the utility of this representation through a number of downstream planning tasks.
arXiv Detail & Related papers (2023-09-28T17:53:38Z) - SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D
Sequences [76.28527350263012]
We propose a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames.
We aggregate PointNet features from primitive scene components by means of a graph neural network.
Our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.
arXiv Detail & Related papers (2021-03-27T13:00:36Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z) - Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph.
We propose a learned method that regresses a scene graph from the point cloud of a scene.
We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.