Related papers: Decomposed Prototype Learning for Few-Shot Scene Graph Generation

Decomposed Prototype Learning for Few-Shot Scene Graph Generation

URL: http://arxiv.org/abs/2303.10863v1
Date: Mon, 20 Mar 2023 04:54:26 GMT
Title: Decomposed Prototype Learning for Few-Shot Scene Graph Generation
Authors: Xingchen Li, Long Chen, Guikun Chen, Yinfu Feng, Yi Yang, and Jun Xiao
Abstract summary: We focus on a new promising task of scene graph generation (SGG): few-shot SGG (FSSGG) FSSGG encourages models to be able to quickly transfer previous knowledge and recognize novel predicates with only a few examples. We propose a novel Decomposed Prototype Learning (DPL)
Score: 28.796734816086065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Today's scene graph generation (SGG) models typically require abundant manual annotations to learn new predicate types. Thus, it is difficult to apply them to real-world applications with a long-tailed distribution of predicates. In this paper, we focus on a new promising task of SGG: few-shot SGG (FSSGG). FSSGG encourages models to be able to quickly transfer previous knowledge and recognize novel predicates well with only a few examples. Although many advanced approaches have achieved great success on few-shot learning (FSL) tasks, straightforwardly extending them into FSSGG is not applicable due to two intrinsic characteristics of predicate concepts: 1) Each predicate category commonly has multiple semantic meanings under different contexts. 2) The visual appearance of relation triplets with the same predicate differs greatly under different subject-object pairs. Both issues make it hard to model conventional latent representations for predicate categories with state-of-the-art FSL methods. To this end, we propose a novel Decomposed Prototype Learning (DPL). Specifically, we first construct a decomposable prototype space to capture intrinsic visual patterns of subjects and objects for predicates, and enhance their feature representations with these decomposed prototypes. Then, we devise an intelligent metric learner to assign adaptive weights to each support sample by considering the relevance of their subject-object pairs. We further re-split the VG dataset and compare DPL with various FSL methods to benchmark this task. Extensive results show that DPL achieves excellent performance in both base and novel categories.

Related papers

Scene Graph Generation with Role-Playing Large Language Models [50.252588437973245]
Current approaches for open-vocabulary scene graph generation (OVSGG) use vision-language models such as CLIP. We propose SDSGG, a scene-specific description based OVSGG framework. To capture the complicated interplay between subjects and objects, we propose a new lightweight module called mutual visual adapter.
arXiv Detail & Related papers (2024-10-20T11:40:31Z)
Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates. We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z)
Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation [21.772806350802203]
In scene graph generation (SGG) datasets, each subject-object pair is annotated with a single predicate. Existing SGG models are trained to predict the one and only predicate for each pair. This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate.
arXiv Detail & Related papers (2024-07-22T05:53:46Z)
Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning [24.98058940030532]
Scene graph generation (SGG) endeavors to predict visual relationships between pairs of objects within an image. This work seeks to address the pitfall inherent in a suite of prior relationship predictions. Motivated by the achievements of in-context learning in pretrained language models, our approach imbues the model with the capability to predict relationships.
arXiv Detail & Related papers (2024-01-26T03:43:22Z)
Panoptic Scene Graph Generation with Semantics-Prototype Learning [23.759498629378772]
Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes. Different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations. We propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones.
arXiv Detail & Related papers (2023-07-28T14:04:06Z)
Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution. We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z)
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World [67.03968403301143]
Scene Graph Generation (SGG) aims to extract subject, predicate, object> relationships in images for vision understanding. Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions. We propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates.
arXiv Detail & Related papers (2023-03-23T13:06:38Z)
Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category. Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations. PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z)
LANDMARK: Language-guided Representation Enhancement Framework for Scene Graph Generation [34.40862385518366]
Scene graph generation (SGG) is a sophisticated task that suffers from both complex visual features and dataset longtail problem. We propose LANDMARK (LANguage-guiDed representationenhanceMent frAmewoRK) that learns predicate-relevant representations from language-vision interactive patterns. This framework is model-agnostic and consistently improves performance on existing SGG models.
arXiv Detail & Related papers (2023-03-02T09:03:11Z)
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image. We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes. Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z)
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories. We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z)
Hierarchical Memory Learning for Fine-Grained Scene Graph Generation [49.39355372599507]
This paper proposes a novel Hierarchical Memory Learning (HML) framework to learn the model from simple to complex. After the autonomous partition of coarse and fine predicates, the model is first trained on the coarse predicates and then learns the fine predicates.
arXiv Detail & Related papers (2022-03-14T08:01:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.