SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene
Graph Generation
- URL: http://arxiv.org/abs/2303.11048v3
- Date: Wed, 20 Dec 2023 14:11:26 GMT
- Title: SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene
Graph Generation
- Authors: Changsheng Lv, Mengshi Qi, Xia Li, Zhengyuan Yang, Huadong Ma
- Abstract summary: We propose a novel model called SGFormer, Semantic Graph TransFormer for point cloud-based 3D scene graph generation.
The task aims to parse a point cloud-based scene into a semantic structural graph, with the core challenge of modeling the complex global structure.
- Score: 46.14140601855313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel model called SGFormer, Semantic Graph
TransFormer for point cloud-based 3D scene graph generation. The task aims to
parse a point cloud-based scene into a semantic structural graph, with the core
challenge of modeling the complex global structure. Existing methods based on
graph convolutional networks (GCNs) suffer from the over-smoothing dilemma and
can only propagate information from limited neighboring nodes. In contrast,
SGFormer uses Transformer layers as the base building block to allow global
information passing, with two types of newly-designed layers tailored for the
3D scene graph generation task. Specifically, we introduce the graph embedding
layer to best utilize the global information in graph edges while maintaining
comparable computation costs. Furthermore, we propose the semantic injection
layer to leverage linguistic knowledge from large-scale language model (i.e.,
ChatGPT), to enhance objects' visual features. We benchmark our SGFormer on the
established 3DSSG dataset and achieve a 40.94% absolute improvement in
relationship prediction's R@50 and an 88.36% boost on the subset with complex
scenes over the state-of-the-art. Our analyses further show SGFormer's
superiority in the long-tail and zero-shot scenarios. Our source code is
available at https://github.com/Andy20178/SGFormer.
Related papers
- Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - TESGNN: Temporal Equivariant Scene Graph Neural Networks for Efficient and Robust Multi-View 3D Scene Understanding [8.32401190051443]
We present the first implementation of an Equivariant Scene Graph Neural Network (ESGNN) to generate semantic scene graphs from 3D point clouds.
Our combined architecture, termed the Temporal Equivariant Scene Graph Neural Network (TESGNN), not only surpasses existing state-of-the-art methods in scene estimation accuracy but also achieves faster convergence.
arXiv Detail & Related papers (2024-11-15T15:39:04Z) - Graph Transformer GANs with Graph Masked Modeling for Architectural
Layout Generation [153.92387500677023]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The proposed graph Transformer encoder combines graph convolutions and self-attentions in a Transformer to model both local and global interactions.
We also propose a novel self-guided pre-training method for graph representation learning.
arXiv Detail & Related papers (2024-01-15T14:36:38Z) - Instance-incremental Scene Graph Generation from Real-world Point Clouds
via Normalizing Flows [9.4858987199432]
This work introduces a new task of instance-incremental scene graph generation: Given a scene of the point cloud, representing it as a graph and automatically increasing novel instances.
A graph denoting the object layout of the scene is finally generated.
It helps to guide the insertion of novel 3D objects into a real-world scene in vision-based applications like augmented reality.
arXiv Detail & Related papers (2023-02-21T03:34:15Z) - Hierarchical Graph Networks for 3D Human Pose Estimation [50.600944798627786]
Recent 2D-to-3D human pose estimation works tend to utilize the graph structure formed by the topology of the human skeleton.
We argue that this skeletal topology is too sparse to reflect the body structure and suffer from serious 2D-to-3D ambiguity problem.
We propose a novel graph convolution network architecture, Hierarchical Graph Networks, to overcome these weaknesses.
arXiv Detail & Related papers (2021-11-23T15:09:03Z) - Local Augmentation for Graph Neural Networks [78.48812244668017]
We introduce the local augmentation, which enhances node features by its local subgraph structures.
Based on the local augmentation, we further design a novel framework: LA-GNN, which can apply to any GNN models in a plug-and-play manner.
arXiv Detail & Related papers (2021-09-08T18:10:08Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - Exploiting Local Geometry for Feature and Graph Construction for Better
3D Point Cloud Processing with Graph Neural Networks [22.936590869919865]
We propose improvements in point representations and local neighborhood graph construction within the general framework of graph neural networks.
We show that the proposed network achieves faster training convergence, i.e. 40% less epochs for classification.
arXiv Detail & Related papers (2021-03-28T21:34:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.