Decomposed Prototype Learning for Few-Shot Scene Graph Generation
- URL: http://arxiv.org/abs/2303.10863v1
- Date: Mon, 20 Mar 2023 04:54:26 GMT
- Title: Decomposed Prototype Learning for Few-Shot Scene Graph Generation
- Authors: Xingchen Li, Long Chen, Guikun Chen, Yinfu Feng, Yi Yang, and Jun Xiao
- Abstract summary: We focus on a new promising task of scene graph generation (SGG): few-shot SGG (FSSGG)
FSSGG encourages models to be able to quickly transfer previous knowledge and recognize novel predicates with only a few examples.
We propose a novel Decomposed Prototype Learning (DPL)
- Score: 28.796734816086065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today's scene graph generation (SGG) models typically require abundant manual
annotations to learn new predicate types. Thus, it is difficult to apply them
to real-world applications with a long-tailed distribution of predicates. In
this paper, we focus on a new promising task of SGG: few-shot SGG (FSSGG).
FSSGG encourages models to be able to quickly transfer previous knowledge and
recognize novel predicates well with only a few examples. Although many
advanced approaches have achieved great success on few-shot learning (FSL)
tasks, straightforwardly extending them into FSSGG is not applicable due to two
intrinsic characteristics of predicate concepts: 1) Each predicate category
commonly has multiple semantic meanings under different contexts. 2) The visual
appearance of relation triplets with the same predicate differs greatly under
different subject-object pairs. Both issues make it hard to model conventional
latent representations for predicate categories with state-of-the-art FSL
methods. To this end, we propose a novel Decomposed Prototype Learning (DPL).
Specifically, we first construct a decomposable prototype space to capture
intrinsic visual patterns of subjects and objects for predicates, and enhance
their feature representations with these decomposed prototypes. Then, we devise
an intelligent metric learner to assign adaptive weights to each support sample
by considering the relevance of their subject-object pairs. We further re-split
the VG dataset and compare DPL with various FSL methods to benchmark this task.
Extensive results show that DPL achieves excellent performance in both base and
novel categories.
Related papers
- Scene Graph Generation with Role-Playing Large Language Models [50.252588437973245]
Current approaches for open-vocabulary scene graph generation (OVSGG) use vision-language models such as CLIP.
We propose SDSGG, a scene-specific description based OVSGG framework.
To capture the complicated interplay between subjects and objects, we propose a new lightweight module called mutual visual adapter.
arXiv Detail & Related papers (2024-10-20T11:40:31Z) - Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates.
We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z) - Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation [21.772806350802203]
In scene graph generation (SGG) datasets, each subject-object pair is annotated with a single predicate.
Existing SGG models are trained to predict the one and only predicate for each pair.
This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate.
arXiv Detail & Related papers (2024-07-22T05:53:46Z) - Panoptic Scene Graph Generation with Semantics-Prototype Learning [23.759498629378772]
Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes.
Different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations.
We propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones.
arXiv Detail & Related papers (2023-07-28T14:04:06Z) - Visually-Prompted Language Model for Fine-Grained Scene Graph Generation
in an Open World [67.03968403301143]
Scene Graph Generation (SGG) aims to extract subject, predicate, object> relationships in images for vision understanding.
Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions.
We propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates.
arXiv Detail & Related papers (2023-03-23T13:06:38Z) - Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.
Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category.
Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations.
PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z) - LANDMARK: Language-guided Representation Enhancement Framework for Scene
Graph Generation [34.40862385518366]
Scene graph generation (SGG) is a sophisticated task that suffers from both complex visual features and dataset longtail problem.
We propose LANDMARK (LANguage-guiDed representationenhanceMent frAmewoRK) that learns predicate-relevant representations from language-vision interactive patterns.
This framework is model-agnostic and consistently improves performance on existing SGG models.
arXiv Detail & Related papers (2023-03-02T09:03:11Z) - Hierarchical Memory Learning for Fine-Grained Scene Graph Generation [49.39355372599507]
This paper proposes a novel Hierarchical Memory Learning (HML) framework to learn the model from simple to complex.
After the autonomous partition of coarse and fine predicates, the model is first trained on the coarse predicates and then learns the fine predicates.
arXiv Detail & Related papers (2022-03-14T08:01:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.