Related papers: Panoptic Scene Graph Generation with Semantics-Prototype Learning

Panoptic Scene Graph Generation with Semantics-Prototype Learning

URL: http://arxiv.org/abs/2307.15567v3
Date: Mon, 22 Jan 2024 13:17:21 GMT
Title: Panoptic Scene Graph Generation with Semantics-Prototype Learning
Authors: Li Li, Wei Ji, Yiming Wu, Mengze Li, You Qin, Lina Wei, Roger Zimmermann
Abstract summary: Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes. Different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations. We propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones.
Score: 23.759498629378772
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes. However, different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations in the dataset, i.e. different predicates for same object pairs. Biased predicate annotations make PSG models struggle in constructing a clear decision plane among predicates, which greatly hinders the real application of PSG models. To address the intrinsic bias above, we propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones. To promise consistency and accuracy during the transfer process, we propose to measure the invariance of representations in each predicate class, and learn unbiased prototypes of predicates with different intensities. Meanwhile, we continuously measure the distribution changes between each presentation and its prototype, and constantly screen potential biased data. Finally, with the unbiased predicate-prototype representation embedding space, biased annotations are easily identified. Experiments show that ADTrans significantly improves the performance of benchmark models, achieving a new state-of-the-art performance, and shows great generalization and effectiveness on multiple datasets.

Related papers

PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks [51.31903029903904]
In Scene Graphs Generation (SGG) one extracts structured representation from visual inputs in the form of objects nodes and predicates connecting them. PRISM-0 is a framework for zero-shot open-vocabulary SGG that bootstraps foundation models in a bottom-up approach. PRIMS-0 generates semantically meaningful graphs that improve downstream tasks such as Image Captioning and Sentence-to-Graph Retrieval.
arXiv Detail & Related papers (2025-04-01T14:29:51Z)
Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates. We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z)
Ensemble Predicate Decoding for Unbiased Scene Graph Generation [40.01591739856469]
Scene Graph Generation (SGG) aims to generate a comprehensive graphical representation that captures semantic information of a given scenario. The model's performance in predicting more fine-grained predicates is hindered by a significant predicate bias. This paper proposes Ensemble Predicate Decoding (EPD), which employs multiple decoders to attain unbiased scene graph generation.
arXiv Detail & Related papers (2024-08-26T11:24:13Z)
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation [21.772806350802203]
In scene graph generation (SGG) datasets, each subject-object pair is annotated with a single predicate. Existing SGG models are trained to predict the one and only predicate for each pair. This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate.
arXiv Detail & Related papers (2024-07-22T05:53:46Z)
Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement [6.8754535229258975]
Scene Graph Generation (SGG) provides basic language representation of visual scenes. Part of triplet labels are rare or even unseen during training, resulting in imprecise predictions. We propose integrating pretrained Vision-language Models to enhance representation.
arXiv Detail & Related papers (2024-03-24T15:02:24Z)
Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models [53.337728969143086]
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations. Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents. We introduce a chain-based prompting approach to uncover semantic aspect-aware interactions.
arXiv Detail & Related papers (2023-12-26T15:44:09Z)
Domain-wise Invariant Learning for Panoptic Scene Graph Generation [26.159312466958]
Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates) The presence of biased predicate annotations poses a significant challenge for PSG models, as it hinders their ability to establish a clear decision boundary among different predicates. We propose a novel framework to infer potentially biased annotations by measuring the predicate prediction risks within each subject-object pair.
arXiv Detail & Related papers (2023-10-09T17:03:39Z)
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation [55.429541407920304]
Recognizing the predicate between subject and object pairs is imbalanced and multi-label in nature. Recent state-of-the-art methods predominantly focus on the most frequently occurring predicate classes. We introduce a multi-label meta-learning framework to deal with the biased predicate distribution.
arXiv Detail & Related papers (2023-06-16T18:14:23Z)
Decomposed Prototype Learning for Few-Shot Scene Graph Generation [42.65759272241633]
We propose a novel Decomposed Prototype Learning (DPL) model for scene graph generation (SGG) We first construct a decomposable prototype space to capture diverse semantics and visual patterns of subjects and objects for predicates.
arXiv Detail & Related papers (2023-03-20T04:54:26Z)
Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category. Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations. PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z)
Unbiased Scene Graph Generation using Predicate Similarities [7.9112365100345965]
Scene Graphs are widely applied in computer vision as a graphical representation of relationships between objects shown in images. These applications have not yet reached a practical stage of development owing to biased training caused by long-tailed predicate distributions. We propose a new classification scheme that branches the process to several fine-grained classifiers for similar predicate groups. The results of extensive experiments on the Visual Genome dataset show that the combination of our method and an existing debiasing approach greatly improves performance on tail predicates in challenging SGCls/SGDet tasks.
arXiv Detail & Related papers (2022-10-03T13:28:01Z)
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.