Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
- URL: http://arxiv.org/abs/2407.15396v2
- Date: Thu, 25 Jul 2024 12:54:52 GMT
- Title: Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
- Authors: Jaehyeong Jeon, Kibum Kim, Kanghoon Yoon, Chanyoung Park,
- Abstract summary: In scene graph generation (SGG) datasets, each subject-object pair is annotated with a single predicate.
Existing SGG models are trained to predict the one and only predicate for each pair.
This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate.
- Score: 21.772806350802203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The scene graph generation (SGG) task involves detecting objects within an image and predicting predicates that represent the relationships between the objects. However, in SGG benchmark datasets, each subject-object pair is annotated with a single predicate even though a single predicate may exhibit diverse semantics (i.e., semantic diversity), existing SGG models are trained to predict the one and only predicate for each pair. This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate, thus leading to biased predictions. In this paper, we propose a novel model-agnostic Semantic Diversity-aware Prototype-based Learning (DPL) framework that enables unbiased predictions based on the understanding of the semantic diversity of predicates. Specifically, DPL learns the regions in the semantic space covered by each predicate to distinguish among the various different semantics that a single predicate can represent. Extensive experiments demonstrate that our proposed model-agnostic DPL framework brings significant performance improvement on existing SGG models, and also effectively understands the semantic diversity of predicates.
Related papers
- Ensemble Predicate Decoding for Unbiased Scene Graph Generation [40.01591739856469]
Scene Graph Generation (SGG) aims to generate a comprehensive graphical representation that captures semantic information of a given scenario.
The model's performance in predicting more fine-grained predicates is hindered by a significant predicate bias.
This paper proposes Ensemble Predicate Decoding (EPD), which employs multiple decoders to attain unbiased scene graph generation.
arXiv Detail & Related papers (2024-08-26T11:24:13Z) - Improving Scene Graph Generation with Relation Words' Debiasing in Vision-Language Models [6.8754535229258975]
Scene Graph Generation (SGG) provides basic language representation of visual scenes.
Part of test triplets are rare or even unseen during training, resulting in predictions.
We propose using the SGG models with pretrained vision-language models (VLMs) to enhance representation.
arXiv Detail & Related papers (2024-03-24T15:02:24Z) - Environment-Invariant Curriculum Relation Learning for Fine-Grained
Scene Graph Generation [66.62453697902947]
The scene graph generation (SGG) task is designed to identify the predicates based on the subject-object pairs.
We propose a novel Environment Invariant Curriculum Relation learning (EICR) method, which can be applied in a plug-and-play fashion to existing SGG methods.
arXiv Detail & Related papers (2023-08-07T03:56:15Z) - Panoptic Scene Graph Generation with Semantics-Prototype Learning [23.759498629378772]
Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes.
Different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations.
We propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones.
arXiv Detail & Related papers (2023-07-28T14:04:06Z) - Decomposed Prototype Learning for Few-Shot Scene Graph Generation [28.796734816086065]
We focus on a new promising task of scene graph generation (SGG): few-shot SGG (FSSGG)
FSSGG encourages models to be able to quickly transfer previous knowledge and recognize novel predicates with only a few examples.
We propose a novel Decomposed Prototype Learning (DPL)
arXiv Detail & Related papers (2023-03-20T04:54:26Z) - Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.
Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category.
Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations.
PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z) - Graph Adaptive Semantic Transfer for Cross-domain Sentiment
Classification [68.06496970320595]
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain.
We present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs.
arXiv Detail & Related papers (2022-05-18T07:47:01Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.