SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene
Graph Generation
- URL: http://arxiv.org/abs/2303.11048v3
- Date: Wed, 20 Dec 2023 14:11:26 GMT
- Title: SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene
Graph Generation
- Authors: Changsheng Lv, Mengshi Qi, Xia Li, Zhengyuan Yang, Huadong Ma
- Abstract summary: We propose a novel model called SGFormer, Semantic Graph TransFormer for point cloud-based 3D scene graph generation.
The task aims to parse a point cloud-based scene into a semantic structural graph, with the core challenge of modeling the complex global structure.
- Score: 46.14140601855313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel model called SGFormer, Semantic Graph
TransFormer for point cloud-based 3D scene graph generation. The task aims to
parse a point cloud-based scene into a semantic structural graph, with the core
challenge of modeling the complex global structure. Existing methods based on
graph convolutional networks (GCNs) suffer from the over-smoothing dilemma and
can only propagate information from limited neighboring nodes. In contrast,
SGFormer uses Transformer layers as the base building block to allow global
information passing, with two types of newly-designed layers tailored for the
3D scene graph generation task. Specifically, we introduce the graph embedding
layer to best utilize the global information in graph edges while maintaining
comparable computation costs. Furthermore, we propose the semantic injection
layer to leverage linguistic knowledge from large-scale language model (i.e.,
ChatGPT), to enhance objects' visual features. We benchmark our SGFormer on the
established 3DSSG dataset and achieve a 40.94% absolute improvement in
relationship prediction's R@50 and an 88.36% boost on the subset with complex
scenes over the state-of-the-art. Our analyses further show SGFormer's
superiority in the long-tail and zero-shot scenarios. Our source code is
available at https://github.com/Andy20178/SGFormer.
Related papers
- Graph Transformer GANs with Graph Masked Modeling for Architectural
Layout Generation [153.92387500677023]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The proposed graph Transformer encoder combines graph convolutions and self-attentions in a Transformer to model both local and global interactions.
We also propose a novel self-guided pre-training method for graph representation learning.
arXiv Detail & Related papers (2024-01-15T14:36:38Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - GraNet: A Multi-Level Graph Network for 6-DoF Grasp Pose Generation in
Cluttered Scenes [0.5755004576310334]
GraNet is a graph-based grasp pose generation framework that translates a point cloud scene into multi-level graphs.
Our pipeline can thus characterize the spatial distribution of grasps in cluttered scenes, leading to a higher rate of effective grasping.
Our method achieves state-of-the-art performance on the large-scale GraspNet-1Billion benchmark, especially in grasping unseen objects.
arXiv Detail & Related papers (2023-12-06T08:36:29Z) - Transformer-based Image Generation from Scene Graphs [11.443097632746763]
Graph-structured scene descriptions can be efficiently used in generative models to control the composition of the generated image.
Previous approaches are based on the combination of graph convolutional networks and adversarial methods for layout prediction and image generation.
We show how employing multi-head attention to encode the graph information can improve the quality of the sampled data.
arXiv Detail & Related papers (2023-03-08T14:54:51Z) - Instance-incremental Scene Graph Generation from Real-world Point Clouds
via Normalizing Flows [9.4858987199432]
This work introduces a new task of instance-incremental scene graph generation: Given a scene of the point cloud, representing it as a graph and automatically increasing novel instances.
A graph denoting the object layout of the scene is finally generated.
It helps to guide the insertion of novel 3D objects into a real-world scene in vision-based applications like augmented reality.
arXiv Detail & Related papers (2023-02-21T03:34:15Z) - Hierarchical Graph Networks for 3D Human Pose Estimation [50.600944798627786]
Recent 2D-to-3D human pose estimation works tend to utilize the graph structure formed by the topology of the human skeleton.
We argue that this skeletal topology is too sparse to reflect the body structure and suffer from serious 2D-to-3D ambiguity problem.
We propose a novel graph convolution network architecture, Hierarchical Graph Networks, to overcome these weaknesses.
arXiv Detail & Related papers (2021-11-23T15:09:03Z) - Local Augmentation for Graph Neural Networks [78.48812244668017]
We introduce the local augmentation, which enhances node features by its local subgraph structures.
Based on the local augmentation, we further design a novel framework: LA-GNN, which can apply to any GNN models in a plug-and-play manner.
arXiv Detail & Related papers (2021-09-08T18:10:08Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - Exploiting Local Geometry for Feature and Graph Construction for Better
3D Point Cloud Processing with Graph Neural Networks [22.936590869919865]
We propose improvements in point representations and local neighborhood graph construction within the general framework of graph neural networks.
We show that the proposed network achieves faster training convergence, i.e. 40% less epochs for classification.
arXiv Detail & Related papers (2021-03-28T21:34:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.