SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene
Reconstruction
- URL: http://arxiv.org/abs/2309.15702v2
- Date: Mon, 6 Nov 2023 10:21:43 GMT
- Title: SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene
Reconstruction
- Authors: Sebastian Koch, Pedro Hermosilla, Narunas Vaskevicius, Mirco Colosi,
Timo Ropinski
- Abstract summary: We present SGRec3D, a novel self-supervised pre-training method for 3D scene graph prediction.
Pre-training SGRec3D does not require object relationship labels, making it possible to exploit large-scale 3D scene understanding datasets.
Our experiments demonstrate that in contrast to recent point cloud-based pre-training approaches, our proposed pre-training improves the 3D scene graph prediction considerably.
- Score: 16.643252717745348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of 3D scene understanding, 3D scene graphs have emerged as a new
scene representation that combines geometric and semantic information about
objects and their relationships. However, learning semantic 3D scene graphs in
a fully supervised manner is inherently difficult as it requires not only
object-level annotations but also relationship labels. While pre-training
approaches have helped to boost the performance of many methods in various
fields, pre-training for 3D scene graph prediction has received little
attention. Furthermore, we find in this paper that classical contrastive point
cloud-based pre-training approaches are ineffective for 3D scene graph
learning. To this end, we present SGRec3D, a novel self-supervised pre-training
method for 3D scene graph prediction. We propose to reconstruct the 3D input
scene from a graph bottleneck as a pretext task. Pre-training SGRec3D does not
require object relationship labels, making it possible to exploit large-scale
3D scene understanding datasets, which were off-limits for 3D scene graph
learning before. Our experiments demonstrate that in contrast to recent point
cloud-based pre-training approaches, our proposed pre-training improves the 3D
scene graph prediction considerably, which results in SOTA performance,
outperforming other 3D scene graph models by +10% on object prediction and +4%
on relationship prediction. Additionally, we show that only using a small
subset of 10% labeled data during fine-tuning is sufficient to outperform the
same model without pre-training.
Related papers
- ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding [2.5165775267615205]
This work is the first to implement an Equivariant Graph Neural Network in semantic scene graph generation from 3D point clouds for scene understanding.
Our proposed method, ESGNN, outperforms existing state-of-the-art approaches, demonstrating a significant improvement in scene estimation with faster convergence.
arXiv Detail & Related papers (2024-06-30T06:58:04Z) - Lang3DSG: Language-based contrastive pre-training for 3D Scene Graph
prediction [16.643252717745348]
We present the first language-based pre-training approach for 3D scene graphs.
We leverage the language encoder of CLIP, a popular vision-language model, to distill its knowledge into our graph-based network.
Our method achieves state-of-the-art results on the main semantic 3D scene graph benchmark.
arXiv Detail & Related papers (2023-10-25T09:26:16Z) - Incremental 3D Semantic Scene Graph Prediction from RGB Sequences [86.77318031029404]
We propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence.
Our method consists of a novel incremental entity estimation pipeline and a scene graph prediction network.
The proposed network estimates 3D semantic scene graphs with iterative message passing using multi-view and geometric features extracted from the scene entities.
arXiv Detail & Related papers (2023-05-04T11:32:16Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D
Sequences [76.28527350263012]
We propose a method to incrementally build up semantic scene graphs from a 3D environment given a sequence of RGB-D frames.
We aggregate PointNet features from primitive scene components by means of a graph neural network.
Our approach outperforms 3D scene graph prediction methods by a large margin and its accuracy is on par with other 3D semantic and panoptic segmentation methods while running at 35 Hz.
arXiv Detail & Related papers (2021-03-27T13:00:36Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph.
We propose a learned method that regresses a scene graph from the point cloud of a scene.
We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.