Prototype-based Embedding Network for Scene Graph Generation
- URL: http://arxiv.org/abs/2303.07096v1
- Date: Mon, 13 Mar 2023 13:30:59 GMT
- Title: Prototype-based Embedding Network for Scene Graph Generation
- Authors: Chaofan Zheng, Xinyu Lyu, Lianli Gao, Bo Dai, and Jingkuan Song
- Abstract summary: Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.
Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category.
Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations.
PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
- Score: 105.97836135784794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current Scene Graph Generation (SGG) methods explore contextual information
to predict relationships among entity pairs. However, due to the diverse visual
appearance of numerous possible subject-object combinations, there is a large
intra-class variation within each predicate category, e.g., "man-eating-pizza,
giraffe-eating-leaf", and the severe inter-class similarity between different
classes, e.g., "man-holding-plate, man-eating-pizza", in model's latent space.
The above challenges prevent current SGG methods from acquiring robust features
for reliable relation prediction. In this paper, we claim that the predicate's
category-inherent semantics can serve as class-wise prototypes in the semantic
space for relieving the challenges. To the end, we propose the Prototype-based
Embedding Network (PE-Net), which models entities/predicates with
prototype-aligned compact and distinctive representations and thereby
establishes matching between entity pairs and predicates in a common embedding
space for relation recognition. Moreover, Prototype-guided Learning (PL) is
introduced to help PE-Net efficiently learn such entitypredicate matching, and
Prototype Regularization (PR) is devised to relieve the ambiguous
entity-predicate matching caused by the predicate's semantic overlap. Extensive
experiments demonstrate that our method gains superior relation recognition
capability on SGG, achieving new state-of-the-art performances on both Visual
Genome and Open Images datasets.
Related papers
- Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation [21.772806350802203]
In scene graph generation (SGG) datasets, each subject-object pair is annotated with a single predicate.
Existing SGG models are trained to predict the one and only predicate for each pair.
This in turn results in the SGG models to overlook the semantic diversity that may exist in a predicate.
arXiv Detail & Related papers (2024-07-22T05:53:46Z) - RAPL: A Relation-Aware Prototype Learning Approach for Few-Shot
Document-Level Relation Extraction [35.246592734300414]
We propose a relation-aware prototype learning method for FSDLRE.
Our method effectively refines the relation prototypes and generates task-specific NOTA prototypes.
arXiv Detail & Related papers (2023-10-24T11:35:23Z) - Decomposed Prototype Learning for Few-Shot Scene Graph Generation [28.796734816086065]
We focus on a new promising task of scene graph generation (SGG): few-shot SGG (FSSGG)
FSSGG encourages models to be able to quickly transfer previous knowledge and recognize novel predicates with only a few examples.
We propose a novel Decomposed Prototype Learning (DPL)
arXiv Detail & Related papers (2023-03-20T04:54:26Z) - A Prototypical Semantic Decoupling Method via Joint Contrastive Learning
for Few-Shot Name Entity Recognition [24.916377682689955]
Few-shot named entity recognition (NER) aims at identifying named entities based on only few labeled instances.
We propose a Prototypical Semantic Decoupling method via joint Contrastive learning (PSDC) for few-shot NER.
Experimental results on two few-shot NER benchmarks demonstrate that PSDC consistently outperforms the previous SOTA methods in terms of overall performance.
arXiv Detail & Related papers (2023-02-27T09:20:00Z) - Graph Adaptive Semantic Transfer for Cross-domain Sentiment
Classification [68.06496970320595]
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain.
We present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs.
arXiv Detail & Related papers (2022-05-18T07:47:01Z) - Query Adaptive Few-Shot Object Detection with Heterogeneous Graph
Convolutional Networks [33.446875089255876]
Few-shot object detection (FSOD) aims to detect never-seen objects using few examples.
We propose a novel FSOD model using heterogeneous graph convolutional networks.
arXiv Detail & Related papers (2021-12-17T22:08:15Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.