Related papers: Fine-Grained Predicates Learning for Scene Graph Generation

Fine-Grained Predicates Learning for Scene Graph Generation

URL: http://arxiv.org/abs/2204.02597v2
Date: Fri, 8 Apr 2022 00:43:13 GMT
Title: Fine-Grained Predicates Learning for Scene Graph Generation
Authors: Xinyu Lyu and Lianli Gao and Yuyu Guo and Zhou Zhao and Hao Huang and Heng Tao Shen and Jingkuan Song
Abstract summary: Fine-Grained Predicates Learning aims at differentiating among hard-to-distinguish predicates for Scene Graph Generation task. We introduce a Predicate Lattice that helps SGG models to figure out fine-grained predicate pairs. We then propose a Category Discriminating Loss and an Entity Discriminating Loss, which both contribute to distinguishing fine-grained predicates.
Score: 155.48614435437355
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e.g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child". While general SGG models are prone to predict head predicates and existing re-balancing strategies prefer tail categories, none of them can appropriately handle these hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating among hard-to-distinguish object classes, we propose a method named Fine-Grained Predicates Learning (FGPL) which aims at differentiating among hard-to-distinguish predicates for Scene Graph Generation task. Specifically, we first introduce a Predicate Lattice that helps SGG models to figure out fine-grained predicate pairs. Then, utilizing the Predicate Lattice, we propose a Category Discriminating Loss and an Entity Discriminating Loss, which both contribute to distinguishing fine-grained predicates while maintaining learned discriminatory power over recognizable ones. The proposed model-agnostic strategy significantly boosts the performances of three benchmark models (Transformer, VCTree, and Motif) by 22.8\%, 24.1\% and 21.7\% of Mean Recall (mR@100) on the Predicate Classification sub-task, respectively. Our model also outperforms state-of-the-art methods by a large margin (i.e., 6.1\%, 4.6\%, and 3.2\% of Mean Recall (mR@100)) on the Visual Genome dataset.

Related papers

PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks [51.31903029903904]
In Scene Graphs Generation (SGG) one extracts structured representation from visual inputs in the form of objects nodes and predicates connecting them. PRISM-0 is a framework for zero-shot open-vocabulary SGG that bootstraps foundation models in a bottom-up approach. PRIMS-0 generates semantically meaningful graphs that improve downstream tasks such as Image Captioning and Sentence-to-Graph Retrieval.
arXiv Detail & Related papers (2025-04-01T14:29:51Z)
RA-SGG: Retrieval-Augmented Scene Graph Generation Framework via Multi-Prototype Learning [24.52282123604646]
Scene Graph Generation (SGG) research has suffered from two fundamental challenges: the long-tailed predicate distribution and semantic ambiguity between predicates. We propose Retrieval-Augmented Scene Graph Generation (RA-SGG), which identifies potential instances to be multi-labeled and enriches the single-label with multi-labels that are semantically similar to the original label. RA-SGG effectively alleviates the issue of biased prediction caused by the long-tailed distribution and semantic ambiguity of predicates.
arXiv Detail & Related papers (2024-12-17T10:47:13Z)
Ensemble Predicate Decoding for Unbiased Scene Graph Generation [40.01591739856469]
Scene Graph Generation (SGG) aims to generate a comprehensive graphical representation that captures semantic information of a given scenario. The model's performance in predicting more fine-grained predicates is hindered by a significant predicate bias. This paper proposes Ensemble Predicate Decoding (EPD), which employs multiple decoders to attain unbiased scene graph generation.
arXiv Detail & Related papers (2024-08-26T11:24:13Z)
Informative Scene Graph Generation via Debiasing [124.71164256146342]
Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object) Due to biases in data, current models tend to predict common predicates. We propose DB-SGG, an effective framework based on debiasing but not the conventional distribution fitting.
arXiv Detail & Related papers (2023-08-10T02:04:01Z)
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World [67.03968403301143]
Scene Graph Generation (SGG) aims to extract subject, predicate, object> relationships in images for vision understanding. Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions. We propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates.
arXiv Detail & Related papers (2023-03-23T13:06:38Z)
Decomposed Prototype Learning for Few-Shot Scene Graph Generation [28.796734816086065]
We focus on a new promising task of scene graph generation (SGG): few-shot SGG (FSSGG) FSSGG encourages models to be able to quickly transfer previous knowledge and recognize novel predicates with only a few examples. We propose a novel Decomposed Prototype Learning (DPL)
arXiv Detail & Related papers (2023-03-20T04:54:26Z)
LANDMARK: Language-guided Representation Enhancement Framework for Scene Graph Generation [34.40862385518366]
Scene graph generation (SGG) is a sophisticated task that suffers from both complex visual features and dataset longtail problem. We propose LANDMARK (LANguage-guiDed representationenhanceMent frAmewoRK) that learns predicate-relevant representations from language-vision interactive patterns. This framework is model-agnostic and consistently improves performance on existing SGG models.
arXiv Detail & Related papers (2023-03-02T09:03:11Z)
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image. We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes. Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z)
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories. We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z)
PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights. Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.