SGTR+: End-to-end Scene Graph Generation with Transformer
- URL: http://arxiv.org/abs/2401.12835v1
- Date: Tue, 23 Jan 2024 15:18:20 GMT
- Title: SGTR+: End-to-end Scene Graph Generation with Transformer
- Authors: Rongjie Li, Songyang Zhang, Xuming He
- Abstract summary: Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property.
Most previous works adopt a bottom-up, two-stage or point-based, one-stage approach, which often suffers from high time complexity or suboptimal designs.
We propose a novel SGG method to address the aforementioned issues, formulating the task as a bipartite graph construction problem.
- Score: 42.396971149458324
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene Graph Generation (SGG) remains a challenging visual understanding task
due to its compositional property. Most previous works adopt a bottom-up,
two-stage or point-based, one-stage approach, which often suffers from high
time complexity or suboptimal designs. In this work, we propose a novel SGG
method to address the aforementioned issues, formulating the task as a
bipartite graph construction problem. To address the issues above, we create a
transformer-based end-to-end framework to generate the entity and entity-aware
predicate proposal set, and infer directed edges to form relation triplets.
Moreover, we design a graph assembling module to infer the connectivity of the
bipartite scene graph based on our entity-aware structure, enabling us to
generate the scene graph in an end-to-end manner. Based on bipartite graph
assembling paradigm, we further propose a new technical design to address the
efficacy of entity-aware modeling and optimization stability of graph
assembling. Equipped with the enhanced entity-aware design, our method achieves
optimal performance and time-complexity. Extensive experimental results show
that our design is able to achieve the state-of-the-art or comparable
performance on three challenging benchmarks, surpassing most of the existing
approaches and enjoying higher efficiency in inference. Code is available:
https://github.com/Scarecrow0/SGTR
Related papers
- VectorGraphNET: Graph Attention Networks for Accurate Segmentation of Complex Technical Drawings [0.40964539027092917]
This paper introduces a new approach to extract and analyze vector data from technical drawings in PDF format.
Our method involves converting PDF files into SVG format and creating a feature-rich graph representation.
We then apply a graph attention transformer with hierarchical label definition to achieve accurate line-level segmentation.
arXiv Detail & Related papers (2024-10-02T08:53:20Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Graph Transformer GANs with Graph Masked Modeling for Architectural
Layout Generation [153.92387500677023]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The proposed graph Transformer encoder combines graph convolutions and self-attentions in a Transformer to model both local and global interactions.
We also propose a novel self-guided pre-training method for graph representation learning.
arXiv Detail & Related papers (2024-01-15T14:36:38Z) - Explore Contextual Information for 3D Scene Graph Generation [43.66442227874461]
3D scene graph generation (SGG) has been of high interest in computer vision.
We propose a framework fully exploring contextual information for the 3D SGG task.
Our approach achieves superior or competitive performance over previous methods on the 3DSSG dataset.
arXiv Detail & Related papers (2022-10-12T14:26:17Z) - Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video)
Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible.
We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z) - SGTR: End-to-end Scene Graph Generation with Transformer [41.606381084893194]
Scene Graph Generation (SGG) remains a challenging visual understanding task due to its complex compositional property.
We propose a novel SGG method to address the aforementioned issues, which formulates the task as a bipartite graph construction problem.
arXiv Detail & Related papers (2021-12-24T07:10:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.