Semantic Compositional Learning for Low-shot Scene Graph Generation
- URL: http://arxiv.org/abs/2108.08600v1
- Date: Thu, 19 Aug 2021 10:13:55 GMT
- Title: Semantic Compositional Learning for Low-shot Scene Graph Generation
- Authors: Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li
- Abstract summary: Many scene graph generation (SGG) models solely use the limited annotated relation triples for training.
We propose a novel semantic compositional learning strategy that makes it possible to construct additional, realistic relation triples.
For three recent SGG models, adding our strategy improves their performance by close to 50%, and all of them substantially exceed the current state-of-the-art.
- Score: 122.51930904132685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene graphs provide valuable information to many downstream tasks. Many
scene graph generation (SGG) models solely use the limited annotated relation
triples for training, leading to their underperformance on low-shot (few and
zero) scenarios, especially on the rare predicates. To address this problem, we
propose a novel semantic compositional learning strategy that makes it possible
to construct additional, realistic relation triples with objects from different
images. Specifically, our strategy decomposes a relation triple by identifying
and removing the unessential component and composes a new relation triple by
fusing with a semantically or visually similar object from a visual components
dictionary, whilst ensuring the realisticity of the newly composed triple.
Notably, our strategy is generic and can be combined with existing SGG models
to significantly improve their performance. We performed a comprehensive
evaluation on the benchmark dataset Visual Genome. For three recent SGG models,
adding our strategy improves their performance by close to 50\%, and all of
them substantially exceed the current state-of-the-art.
Related papers
- Leveraging Predicate and Triplet Learning for Scene Graph Generation [31.09787444957997]
Scene Graph Generation (SGG) aims to identify entities and predict the relationship triplets.
We propose a Dual-granularity Relation Modeling (DRM) network to leverage fine-grained triplet cues besides the coarse-grained predicate ones.
Our method establishes new state-of-the-art performance on Visual Genome, Open Image, and GQA datasets.
arXiv Detail & Related papers (2024-06-04T07:23:41Z) - DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation [13.058196732927135]
Scene graph generation aims to capture detailed spatial and semantic relationships between objects in an image.
Existing Transformer-based methods either employ distinct queries for objects and predicates or utilize holistic queries for relation triplets.
We present a new Transformer-based method, called DSGG, that views scene graph detection as a direct graph prediction problem.
arXiv Detail & Related papers (2024-03-21T23:43:30Z) - Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and
Message Passing Neural Network [3.9280441311534653]
Scene graph generation (SGG) captures the relationships between objects in an image and creates a structured graph-based representation.
Existing SGG methods have a limited ability to accurately predict detailed relationships.
A new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein.
arXiv Detail & Related papers (2023-11-02T12:36:52Z) - Towards Unseen Triples: Effective Text-Image-joint Learning for Scene
Graph Generation [30.79358827005448]
Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images.
Existing SGG models often struggle to solve the long-tailed problem caused by biased datasets.
We propose a Text-Image-joint Scene Graph Generation (TISGG) model to resolve the unseen triples and improve the generalisation capability of the SGG models.
arXiv Detail & Related papers (2023-06-23T10:17:56Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z) - MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes [89.75025195440287]
Existing methods only treat such relations as by-products of object feature learning in graphs without specifically encoding them.
We propose MORE, a Multi-Order RElation mining model, to support generating more descriptive and comprehensive captions.
Our MORE encodes object relations in a progressive manner since complex relations can be deduced from a limited number of basic ones.
arXiv Detail & Related papers (2022-03-10T07:26:15Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - Not All Relations are Equal: Mining Informative Labels for Scene Graph
Generation [48.21846438269506]
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects.
Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data.
We propose a novel framework for SGG training that exploits relation labels based on their informativeness.
arXiv Detail & Related papers (2021-11-26T14:34:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.