Fine-Grained Predicates Learning for Scene Graph Generation
- URL: http://arxiv.org/abs/2204.02597v2
- Date: Fri, 8 Apr 2022 00:43:13 GMT
- Title: Fine-Grained Predicates Learning for Scene Graph Generation
- Authors: Xinyu Lyu and Lianli Gao and Yuyu Guo and Zhou Zhao and Hao Huang and
Heng Tao Shen and Jingkuan Song
- Abstract summary: Fine-Grained Predicates Learning aims at differentiating among hard-to-distinguish predicates for Scene Graph Generation task.
We introduce a Predicate Lattice that helps SGG models to figure out fine-grained predicate pairs.
We then propose a Category Discriminating Loss and an Entity Discriminating Loss, which both contribute to distinguishing fine-grained predicates.
- Score: 155.48614435437355
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The performance of current Scene Graph Generation models is severely hampered
by some hard-to-distinguish predicates, e.g., "woman-on/standing on/walking
on-beach" or "woman-near/looking at/in front of-child". While general SGG
models are prone to predict head predicates and existing re-balancing
strategies prefer tail categories, none of them can appropriately handle these
hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained
image classification, which focuses on differentiating among
hard-to-distinguish object classes, we propose a method named Fine-Grained
Predicates Learning (FGPL) which aims at differentiating among
hard-to-distinguish predicates for Scene Graph Generation task. Specifically,
we first introduce a Predicate Lattice that helps SGG models to figure out
fine-grained predicate pairs. Then, utilizing the Predicate Lattice, we propose
a Category Discriminating Loss and an Entity Discriminating Loss, which both
contribute to distinguishing fine-grained predicates while maintaining learned
discriminatory power over recognizable ones. The proposed model-agnostic
strategy significantly boosts the performances of three benchmark models
(Transformer, VCTree, and Motif) by 22.8\%, 24.1\% and 21.7\% of Mean Recall
(mR@100) on the Predicate Classification sub-task, respectively. Our model also
outperforms state-of-the-art methods by a large margin (i.e., 6.1\%, 4.6\%, and
3.2\% of Mean Recall (mR@100)) on the Visual Genome dataset.
Related papers
- Ensemble Predicate Decoding for Unbiased Scene Graph Generation [40.01591739856469]
Scene Graph Generation (SGG) aims to generate a comprehensive graphical representation that captures semantic information of a given scenario.
The model's performance in predicting more fine-grained predicates is hindered by a significant predicate bias.
This paper proposes Ensemble Predicate Decoding (EPD), which employs multiple decoders to attain unbiased scene graph generation.
arXiv Detail & Related papers (2024-08-26T11:24:13Z) - Informative Scene Graph Generation via Debiasing [111.36290856077584]
Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object)
Due to biases in data, current models tend to predict common predicates.
We propose DB-SGG, an effective framework based on debiasing but not the conventional distribution fitting.
arXiv Detail & Related papers (2023-08-10T02:04:01Z) - Visually-Prompted Language Model for Fine-Grained Scene Graph Generation
in an Open World [67.03968403301143]
Scene Graph Generation (SGG) aims to extract subject, predicate, object> relationships in images for vision understanding.
Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions.
We propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates.
arXiv Detail & Related papers (2023-03-23T13:06:38Z) - Decomposed Prototype Learning for Few-Shot Scene Graph Generation [28.796734816086065]
We focus on a new promising task of scene graph generation (SGG): few-shot SGG (FSSGG)
FSSGG encourages models to be able to quickly transfer previous knowledge and recognize novel predicates with only a few examples.
We propose a novel Decomposed Prototype Learning (DPL)
arXiv Detail & Related papers (2023-03-20T04:54:26Z) - LANDMARK: Language-guided Representation Enhancement Framework for Scene
Graph Generation [34.40862385518366]
Scene graph generation (SGG) is a sophisticated task that suffers from both complex visual features and dataset longtail problem.
We propose LANDMARK (LANguage-guiDed representationenhanceMent frAmewoRK) that learns predicate-relevant representations from language-vision interactive patterns.
This framework is model-agnostic and consistently improves performance on existing SGG models.
arXiv Detail & Related papers (2023-03-02T09:03:11Z) - Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.