Panoptic Scene Graph Generation with Semantics-Prototype Learning
- URL: http://arxiv.org/abs/2307.15567v3
- Date: Mon, 22 Jan 2024 13:17:21 GMT
- Title: Panoptic Scene Graph Generation with Semantics-Prototype Learning
- Authors: Li Li, Wei Ji, Yiming Wu, Mengze Li, You Qin, Lina Wei, Roger
Zimmermann
- Abstract summary: Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes.
Different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations.
We propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones.
- Score: 23.759498629378772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Panoptic Scene Graph Generation (PSG) parses objects and predicts their
relationships (predicate) to connect human language and visual scenes. However,
different language preferences of annotators and semantic overlaps between
predicates lead to biased predicate annotations in the dataset, i.e. different
predicates for same object pairs. Biased predicate annotations make PSG models
struggle in constructing a clear decision plane among predicates, which greatly
hinders the real application of PSG models. To address the intrinsic bias
above, we propose a novel framework named ADTrans to adaptively transfer biased
predicate annotations to informative and unified ones. To promise consistency
and accuracy during the transfer process, we propose to measure the invariance
of representations in each predicate class, and learn unbiased prototypes of
predicates with different intensities. Meanwhile, we continuously measure the
distribution changes between each presentation and its prototype, and
constantly screen potential biased data. Finally, with the unbiased
predicate-prototype representation embedding space, biased annotations are
easily identified. Experiments show that ADTrans significantly improves the
performance of benchmark models, achieving a new state-of-the-art performance,
and shows great generalization and effectiveness on multiple datasets.
Related papers
- Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation [21.772806350802203]
In scene graph generation (SGG) datasets, each subject-object pair is annotated with a single predicate.
Existing SGG models are trained to predict the one and only predicate for each pair.
This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate.
arXiv Detail & Related papers (2024-07-22T05:53:46Z) - Domain-wise Invariant Learning for Panoptic Scene Graph Generation [26.159312466958]
Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates)
The presence of biased predicate annotations poses a significant challenge for PSG models, as it hinders their ability to establish a clear decision boundary among different predicates.
We propose a novel framework to infer potentially biased annotations by measuring the predicate prediction risks within each subject-object pair.
arXiv Detail & Related papers (2023-10-09T17:03:39Z) - Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph
Generation [55.429541407920304]
Recognizing the predicate between subject and object pairs is imbalanced and multi-label in nature.
Recent state-of-the-art methods predominantly focus on the most frequently occurring predicate classes.
We introduce a multi-label meta-learning framework to deal with the biased predicate distribution.
arXiv Detail & Related papers (2023-06-16T18:14:23Z) - Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.
Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category.
Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations.
PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z) - LANDMARK: Language-guided Representation Enhancement Framework for Scene
Graph Generation [34.40862385518366]
Scene graph generation (SGG) is a sophisticated task that suffers from both complex visual features and dataset longtail problem.
We propose LANDMARK (LANguage-guiDed representationenhanceMent frAmewoRK) that learns predicate-relevant representations from language-vision interactive patterns.
This framework is model-agnostic and consistently improves performance on existing SGG models.
arXiv Detail & Related papers (2023-03-02T09:03:11Z) - Unbiased Scene Graph Generation using Predicate Similarities [7.9112365100345965]
Scene Graphs are widely applied in computer vision as a graphical representation of relationships between objects shown in images.
These applications have not yet reached a practical stage of development owing to biased training caused by long-tailed predicate distributions.
We propose a new classification scheme that branches the process to several fine-grained classifiers for similar predicate groups.
The results of extensive experiments on the Visual Genome dataset show that the combination of our method and an existing debiasing approach greatly improves performance on tail predicates in challenging SGCls/SGDet tasks.
arXiv Detail & Related papers (2022-10-03T13:28:01Z) - Visual Comparison of Language Model Adaptation [55.92129223662381]
adapters are lightweight alternatives for model adaptation.
In this paper, we discuss several design and alternatives for interactive, comparative visual explanation methods.
We show that, for instance, an adapter trained on the language debiasing task according to context-0 embeddings introduces a new type of bias.
arXiv Detail & Related papers (2022-08-17T09:25:28Z) - Graph Adaptive Semantic Transfer for Cross-domain Sentiment
Classification [68.06496970320595]
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain.
We present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs.
arXiv Detail & Related papers (2022-05-18T07:47:01Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.