Self-Supervised Relation Alignment for Scene Graph Generation
- URL: http://arxiv.org/abs/2302.01403v2
- Date: Tue, 12 Dec 2023 10:57:26 GMT
- Title: Self-Supervised Relation Alignment for Scene Graph Generation
- Authors: Bicheng Xu, Renjie Liao, Leonid Sigal
- Abstract summary: We introduce a self-supervised relational alignment regularization to improve scene graph generation performance.
The proposed alignment is general and can be combined with any existing scene graph generation framework.
We illustrate the effectiveness of this self-supervised relational alignment in conjunction with two scene graph generation architectures.
- Score: 44.3983804479146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of scene graph generation is to predict a graph from an input image,
where nodes correspond to identified and localized objects and edges to their
corresponding interaction predicates. Existing methods are trained in a fully
supervised manner and focus on message passing mechanisms, loss functions,
and/or bias mitigation. In this work we introduce a simple-yet-effective
self-supervised relational alignment regularization designed to improve the
scene graph generation performance. The proposed alignment is general and can
be combined with any existing scene graph generation framework, where it is
trained alongside the original model's objective. The alignment is achieved
through distillation, where an auxiliary relation prediction branch, that
mirrors and shares parameters with the supervised counterpart, is designed. In
the auxiliary branch, relational input features are partially masked prior to
message passing and predicate prediction. The predictions for masked relations
are then aligned with the supervised counterparts after the message passing. We
illustrate the effectiveness of this self-supervised relational alignment in
conjunction with two scene graph generation architectures, SGTR and Neural
Motifs, and show that in both cases we achieve significantly improved
performance.
Related papers
- Graph Transformer GANs with Graph Masked Modeling for Architectural
Layout Generation [153.92387500677023]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The proposed graph Transformer encoder combines graph convolutions and self-attentions in a Transformer to model both local and global interactions.
We also propose a novel self-guided pre-training method for graph representation learning.
arXiv Detail & Related papers (2024-01-15T14:36:38Z) - Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and
Message Passing Neural Network [3.9280441311534653]
Scene graph generation (SGG) captures the relationships between objects in an image and creates a structured graph-based representation.
Existing SGG methods have a limited ability to accurately predict detailed relationships.
A new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein.
arXiv Detail & Related papers (2023-11-02T12:36:52Z) - Line Graph Contrastive Learning for Link Prediction [4.876567687745239]
We propose a Line Graph Contrastive Learning (LGCL) method to obtain multiview information.
With experiments on six public datasets, LGCL outperforms current benchmarks on link prediction tasks.
arXiv Detail & Related papers (2022-10-25T06:57:00Z) - Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video)
Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible.
We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - RelTR: Relation Transformer for Scene Graph Generation [34.1193503312965]
We propose an end-to-end scene graph generation model RelTR with an encoder-decoder architecture.
The model infers a fixed-size set of triplets subject-predicate-object using different types of attention mechanisms.
Experiments on the Visual Genome and Open Images V6 datasets demonstrate the superior performance and fast inference of our model.
arXiv Detail & Related papers (2022-01-27T11:53:41Z) - Segmentation-grounded Scene Graph Generation [47.34166260639392]
We propose a framework for pixel-level segmentation-grounded scene graph generation.
Our framework is agnostic to the underlying scene graph generation method.
It is learned in a multi-task manner with both target and auxiliary datasets.
arXiv Detail & Related papers (2021-04-29T08:54:08Z) - Jointly Cross- and Self-Modal Graph Attention Network for Query-Based
Moment Localization [77.21951145754065]
We propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.
Our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization.
arXiv Detail & Related papers (2020-08-04T08:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.