Related papers: SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation

SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation

URL: http://arxiv.org/abs/2303.11048v3
Date: Wed, 20 Dec 2023 14:11:26 GMT
Title: SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation
Authors: Changsheng Lv, Mengshi Qi, Xia Li, Zhengyuan Yang, Huadong Ma
Abstract summary: We propose a novel model called SGFormer, Semantic Graph TransFormer for point cloud-based 3D scene graph generation. The task aims to parse a point cloud-based scene into a semantic structural graph, with the core challenge of modeling the complex global structure.
Score: 46.14140601855313
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a novel model called SGFormer, Semantic Graph TransFormer for point cloud-based 3D scene graph generation. The task aims to parse a point cloud-based scene into a semantic structural graph, with the core challenge of modeling the complex global structure. Existing methods based on graph convolutional networks (GCNs) suffer from the over-smoothing dilemma and can only propagate information from limited neighboring nodes. In contrast, SGFormer uses Transformer layers as the base building block to allow global information passing, with two types of newly-designed layers tailored for the 3D scene graph generation task. Specifically, we introduce the graph embedding layer to best utilize the global information in graph edges while maintaining comparable computation costs. Furthermore, we propose the semantic injection layer to leverage linguistic knowledge from large-scale language model (i.e., ChatGPT), to enhance objects' visual features. We benchmark our SGFormer on the established 3DSSG dataset and achieve a 40.94% absolute improvement in relationship prediction's R@50 and an 88.36% boost on the subset with complex scenes over the state-of-the-art. Our analyses further show SGFormer's superiority in the long-tail and zero-shot scenarios. Our source code is available at https://github.com/Andy20178/SGFormer.

Related papers

Graph-Guided Dual-Level Augmentation for 3D Scene Segmentation [21.553363236403822]
3D point cloud segmentation aims to assign semantic labels to individual points in a scene for fine-grained spatial understanding.<n>Existing methods typically adopt data augmentation to alleviate the burden of large-scale annotation.<n>We propose a graph-guided data augmentation framework with dual-level constraints for realistic 3D scene synthesis.
arXiv Detail & Related papers (2025-07-30T13:25:36Z)
Scalable Graph Generative Modeling via Substructure Sequences [37.64864614356634]
We introduce Generative Graph Pattern Machine (G$2$PM), a generative Transformer pre-training framework for graphs.<n>G$2$PM represents graph instances as sequences of substructures, and employs generative pre-training over the sequences to learn generalizable, transferable representations.<n>On the ogbn-arxiv benchmark, G$2$PM continues to improve with model sizes up to 60M parameters, outperforming prior generative approaches that plateau at significantly smaller scales.
arXiv Detail & Related papers (2025-05-22T02:16:34Z)
Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding. An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z)
TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding [8.32401190051443]
We present the first implementation of an Equivariant Scene Graph Neural Network (ESGNN) to generate semantic scene graphs from 3D point clouds. Our combined architecture, termed the Temporal Equivariant Scene Graph Neural Network (TESGNN), not only surpasses existing state-of-the-art methods in scene estimation accuracy but also achieves faster convergence.
arXiv Detail & Related papers (2024-11-15T15:39:04Z)
Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation [153.92387500677023]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations. The proposed graph Transformer encoder combines graph convolutions and self-attentions in a Transformer to model both local and global interactions. We also propose a novel self-guided pre-training method for graph representation learning.
arXiv Detail & Related papers (2024-01-15T14:36:38Z)
GraNet: A Multi-Level Graph Network for 6-DoF Grasp Pose Generation in Cluttered Scenes [0.5755004576310334]
GraNet is a graph-based grasp pose generation framework that translates a point cloud scene into multi-level graphs. Our pipeline can thus characterize the spatial distribution of grasps in cluttered scenes, leading to a higher rate of effective grasping. Our method achieves state-of-the-art performance on the large-scale GraspNet-1Billion benchmark, especially in grasping unseen objects.
arXiv Detail & Related papers (2023-12-06T08:36:29Z)
Instance-incremental Scene Graph Generation from Real-world Point Clouds via Normalizing Flows [9.4858987199432]
This work introduces a new task of instance-incremental scene graph generation: Given a scene of the point cloud, representing it as a graph and automatically increasing novel instances. A graph denoting the object layout of the scene is finally generated. It helps to guide the insertion of novel 3D objects into a real-world scene in vision-based applications like augmented reality.
arXiv Detail & Related papers (2023-02-21T03:34:15Z)
Hierarchical Graph Networks for 3D Human Pose Estimation [50.600944798627786]
Recent 2D-to-3D human pose estimation works tend to utilize the graph structure formed by the topology of the human skeleton. We argue that this skeletal topology is too sparse to reflect the body structure and suffer from serious 2D-to-3D ambiguity problem. We propose a novel graph convolution network architecture, Hierarchical Graph Networks, to overcome these weaknesses.
arXiv Detail & Related papers (2021-11-23T15:09:03Z)
Local Augmentation for Graph Neural Networks [78.48812244668017]
We introduce the local augmentation, which enhances node features by its local subgraph structures. Based on the local augmentation, we further design a novel framework: LA-GNN, which can apply to any GNN models in a plug-and-play manner.
arXiv Detail & Related papers (2021-09-08T18:10:08Z)
GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models. With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow. In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z)
Exploiting Local Geometry for Feature and Graph Construction for Better 3D Point Cloud Processing with Graph Neural Networks [22.936590869919865]
We propose improvements in point representations and local neighborhood graph construction within the general framework of graph neural networks. We show that the proposed network achieves faster training convergence, i.e. 40% less epochs for classification.
arXiv Detail & Related papers (2021-03-28T21:34:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.