Object-Centric Representation Learning for Enhanced 3D Scene Graph Prediction
- URL: http://arxiv.org/abs/2510.04714v1
- Date: Mon, 06 Oct 2025 11:33:09 GMT
- Title: Object-Centric Representation Learning for Enhanced 3D Scene Graph Prediction
- Authors: KunHo Heo, GiHyun Kim, SuYeon Kim, MyeongAh Cho,
- Abstract summary: 3D Semantic Scene Graph Prediction aims to detect objects and their semantic relationships in 3D scenes.<n>Previous research has addressed dataset limitations and explored various approaches including Open-Vocabulary settings.<n>We demonstrate through extensive analysis that the quality of object features plays a critical role in determining overall scene graph accuracy.
- Score: 3.7471945679132594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D Semantic Scene Graph Prediction aims to detect objects and their semantic relationships in 3D scenes, and has emerged as a crucial technology for robotics and AR/VR applications. While previous research has addressed dataset limitations and explored various approaches including Open-Vocabulary settings, they frequently fail to optimize the representational capacity of object and relationship features, showing excessive reliance on Graph Neural Networks despite insufficient discriminative capability. In this work, we demonstrate through extensive analysis that the quality of object features plays a critical role in determining overall scene graph accuracy. To address this challenge, we design a highly discriminative object feature encoder and employ a contrastive pretraining strategy that decouples object representation learning from the scene graph prediction. This design not only enhances object classification accuracy but also yields direct improvements in relationship prediction. Notably, when plugging in our pretrained encoder into existing frameworks, we observe substantial performance improvements across all evaluation metrics. Additionally, whereas existing approaches have not fully exploited the integration of relationship information, we effectively combine both geometric and semantic features to achieve superior relationship prediction. Comprehensive experiments on the 3DSSG dataset demonstrate that our approach significantly outperforms previous state-of-the-art methods. Our code is publicly available at https://github.com/VisualScienceLab-KHU/OCRL-3DSSG-Codes.
Related papers
- ExPrIS: Knowledge-Level Expectations as Priors for Object Interpretation from Sensor Data [1.0801606421449652]
ExPrIS project investigates how knowledge-level expectations can serve as to improve object interpretation from sensor data.<n>We integrate expectations from two sources: contextual priors from past observations and semantic knowledge from external graphs like ConceptNet.<n>This method moves beyond static, frame-by-frame analysis to enhance the robustness and consistency of scene understanding over time.
arXiv Detail & Related papers (2026-01-21T14:27:38Z) - Edge-Centric Relational Reasoning for 3D Scene Graph Prediction [74.19580969696898]
3D scene graph prediction aims to abstract complex 3D environments into structured graphs consisting of objects and their pairwise relationships.<n>Existing approaches typically adopt object-centric graph neural networks, where relation edge features are iteratively updated by aggregating messages from connected object nodes.<n>We propose a Link-guided Edge-centric relational reasoning framework with Object-aware fusion.
arXiv Detail & Related papers (2025-11-19T09:53:56Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - GraphRelate3D: Context-Dependent 3D Object Detection with Inter-Object Relationship Graphs [13.071451453118783]
We introduce an object relation module, consisting of a graph generator and a graph neural network (GNN) to learn the spatial information from certain patterns to improve 3D object detection.
Our approach improves upon the baseline PV-RCNN on the KITTI validation set for the car class across easy, moderate, and hard difficulty levels by 0.82%, 0.74%, and 0.58%, respectively.
arXiv Detail & Related papers (2024-05-10T19:18:02Z) - Explore Contextual Information for 3D Scene Graph Generation [43.66442227874461]
3D scene graph generation (SGG) has been of high interest in computer vision.
We propose a framework fully exploring contextual information for the 3D SGG task.
Our approach achieves superior or competitive performance over previous methods on the 3DSSG dataset.
arXiv Detail & Related papers (2022-10-12T14:26:17Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Object-Based Augmentation Improves Quality of Remote SensingSemantic
Segmentation [0.0]
This study focuses on the development and testing of object-based augmentation.
We propose a novel pipeline for georeferenced image augmentation that enables a significant increase in the number of training samples.
The presented pipeline is called object-based augmentation (OBA) and exploits objects' segmentation masks to produce new realistic training scenes.
arXiv Detail & Related papers (2021-05-12T08:54:55Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.