BGT-Net: Bidirectional GRU Transformer Network for Scene Graph
Generation
- URL: http://arxiv.org/abs/2109.05346v1
- Date: Sat, 11 Sep 2021 19:14:40 GMT
- Title: BGT-Net: Bidirectional GRU Transformer Network for Scene Graph
Generation
- Authors: Naina Dhingra, Florian Ritter, Andreas Kunz
- Abstract summary: Scene graph generation (SGG) aims to identify the objects and their relationships.
We propose a bidirectional GRU (BiGRU) transformer network (BGT-Net) for the scene graph generation for images.
This model implements novel object-object communication to enhance the object information using a BiGRU layer.
- Score: 0.15469452301122172
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Scene graphs are nodes and edges consisting of objects and object-object
relationships, respectively. Scene graph generation (SGG) aims to identify the
objects and their relationships. We propose a bidirectional GRU (BiGRU)
transformer network (BGT-Net) for the scene graph generation for images. This
model implements novel object-object communication to enhance the object
information using a BiGRU layer. Thus, the information of all objects in the
image is available for the other objects, which can be leveraged later in the
object prediction step. This object information is used in a transformer
encoder to predict the object class as well as to create object-specific edge
information via the use of another transformer encoder. To handle the dataset
bias induced by the long-tailed relationship distribution, softening with a
log-softmax function and adding a bias adaptation term to regulate the bias for
every relation prediction individually showed to be an effective approach. We
conducted an elaborate study on experiments and ablations using open-source
datasets, i.e., Visual Genome, Open-Images, and Visual Relationship Detection
datasets, demonstrating the effectiveness of the proposed model over state of
the art.
Related papers
- Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - Open-Vocabulary Object Detection via Scene Graph Discovery [53.27673119360868]
Open-vocabulary (OV) object detection has attracted increasing research attention.
We propose a novel Scene-Graph-Based Discovery Network (SGDN) that exploits scene graph cues for OV detection.
arXiv Detail & Related papers (2023-07-07T00:46:19Z) - Graph Transformer GANs for Graph-Constrained House Generation [223.739067413952]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The GTGAN learns effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.
arXiv Detail & Related papers (2023-03-14T20:35:45Z) - Detecting Objects with Context-Likelihood Graphs and Graph Refinement [45.70356990655389]
The goal of this paper is to detect objects by exploiting their ins. Contrary to existing methods, which learn objects and relations separately, our key idea is to learn the object-relation distribution jointly.
We propose a novel way of creating a graphical representation of an image from inter-object relations and initial class predictions, we call a context-likelihood graph.
We then learn the joint with an energy-based modeling technique which allows a sample and refine the context-likelihood graph iteratively for a given image.
arXiv Detail & Related papers (2022-12-23T15:27:21Z) - Iterative Scene Graph Generation with Generative Transformers [6.243995448840211]
Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format.
Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene.
This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction.
arXiv Detail & Related papers (2022-11-30T00:05:44Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - RelTR: Relation Transformer for Scene Graph Generation [34.1193503312965]
We propose an end-to-end scene graph generation model RelTR with an encoder-decoder architecture.
The model infers a fixed-size set of triplets subject-predicate-object using different types of attention mechanisms.
Experiments on the Visual Genome and Open Images V6 datasets demonstrate the superior performance and fast inference of our model.
arXiv Detail & Related papers (2022-01-27T11:53:41Z) - TransMOT: Spatial-Temporal Graph Transformer for Multiple Object
Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video.
TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy.
The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z) - Relation Transformer Network [25.141472361426818]
We propose a novel transformer formulation for scene graph generation and relation prediction.
We leverage the encoder-decoder architecture of the transformer for rich feature embedding of nodes and edges.
Our relation prediction module classifies the directed relation from the learned node and edge embedding.
arXiv Detail & Related papers (2020-04-13T20:47:01Z) - Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS)
AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges.
Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.