SGTR: End-to-end Scene Graph Generation with Transformer
- URL: http://arxiv.org/abs/2112.12970v1
- Date: Fri, 24 Dec 2021 07:10:18 GMT
- Title: SGTR: End-to-end Scene Graph Generation with Transformer
- Authors: Rongjie Li, Songyang Zhang, Xuming He
- Abstract summary: Scene Graph Generation (SGG) remains a challenging visual understanding task due to its complex compositional property.
We propose a novel SGG method to address the aforementioned issues, which formulates the task as a bipartite graph construction problem.
- Score: 41.606381084893194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene Graph Generation (SGG) remains a challenging visual understanding task
due to its complex compositional property. Most previous works adopt a
bottom-up two-stage or a point-based one-stage approach, which often suffers
from overhead time complexity or sub-optimal design assumption. In this work,
we propose a novel SGG method to address the aforementioned issues, which
formulates the task as a bipartite graph construction problem. To solve the
problem, we develop a transformer-based end-to-end framework that first
generates the entity and predicate proposal set, followed by inferring directed
edges to form the relation triplets. In particular, we develop a new
entity-aware predicate representation based on a structural predicate generator
to leverage the compositional property of relationships. Moreover, we design a
graph assembling module to infer the connectivity of the bipartite scene graph
based on our entity-aware structure, enabling us to generate the scene graph in
an end-to-end manner. Extensive experimental results show that our design is
able to achieve the state-of-the-art or comparable performance on two
challenging benchmarks, surpassing most of the existing approaches and enjoying
higher efficiency in inference. We hope our model can serve as a strong
baseline for the Transformer-based scene graph generation.
Related papers
- A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - SGTR+: End-to-end Scene Graph Generation with Transformer [42.396971149458324]
Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property.
Most previous works adopt a bottom-up, two-stage or point-based, one-stage approach, which often suffers from high time complexity or suboptimal designs.
We propose a novel SGG method to address the aforementioned issues, formulating the task as a bipartite graph construction problem.
arXiv Detail & Related papers (2024-01-23T15:18:20Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video)
Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible.
We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z) - Deformable Graph Transformer [31.254872949603982]
We propose Deformable Graph Transformer (DGT) that performs sparse attention with dynamically sampled key and value pairs.
Experiments demonstrate that our novel graph Transformer consistently outperforms existing Transformer-based models.
arXiv Detail & Related papers (2022-06-29T00:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.