Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning
- URL: http://arxiv.org/abs/2208.08165v1
- Date: Wed, 17 Aug 2022 09:05:38 GMT
- Title: Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning
- Authors: Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li
- Abstract summary: Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
- Score: 84.39787427288525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene graph generation (SGG) is a fundamental task aimed at detecting visual
relations between objects in an image. The prevailing SGG methods require all
object classes to be given in the training set. Such a closed setting limits
the practical application of SGG. In this paper, we introduce open-vocabulary
scene graph generation, a novel, realistic and challenging setting in which a
model is trained on a set of base object classes but is required to infer
relations for unseen target object classes. To this end, we propose a two-step
method that firstly pre-trains on large amounts of coarse-grained
region-caption data and then leverages two prompt-based techniques to finetune
the pre-trained model without updating its parameters. Moreover, our method can
support inference over completely unseen object classes, which existing methods
are incapable of handling. On extensive experiments on three benchmark
datasets, Visual Genome, GQA, and Open-Image, our method significantly
outperforms recent, strong SGG methods on the setting of Ov-SGG, as well as on
the conventional closed SGG.
Related papers
- Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency [3.351553095054309]
Scene graph generation (SGG) represents the relationships between objects in an image as a graph structure.
Previous studies have failed to reflect the co-occurrence of objects during SGG generation.
We propose CooK, which reflects the Co-occurrence Knowledge between objects, and the learnable term frequency-inverse document frequency.
arXiv Detail & Related papers (2024-05-21T09:56:48Z) - S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Towards Lifelong Scene Graph Generation with Knowledge-ware In-context
Prompt Learning [24.98058940030532]
Scene graph generation (SGG) endeavors to predict visual relationships between pairs of objects within an image.
This work seeks to address the pitfall inherent in a suite of prior relationship predictions.
Motivated by the achievements of in-context learning in pretrained language models, our approach imbues the model with the capability to predict relationships.
arXiv Detail & Related papers (2024-01-26T03:43:22Z) - Adaptive Self-training Framework for Fine-grained Scene Graph Generation [29.37568710952893]
Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets.
We introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets.
Our experiments verify the effectiveness of ST-SGG on various SGG models.
arXiv Detail & Related papers (2024-01-18T08:10:34Z) - Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention [69.36723767339001]
Scene Graph Generation (SGG) offers a structured representation critical in many computer vision applications.
We propose a unified framework named OvSGTR towards fully open vocabulary SGG from a holistic view.
For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pretraining.
arXiv Detail & Related papers (2023-11-18T06:49:17Z) - Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and
Message Passing Neural Network [3.9280441311534653]
Scene graph generation (SGG) captures the relationships between objects in an image and creates a structured graph-based representation.
Existing SGG methods have a limited ability to accurately predict detailed relationships.
A new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein.
arXiv Detail & Related papers (2023-11-02T12:36:52Z) - Fine-Grained Scene Graph Generation with Data Transfer [127.17675443137064]
Scene graph generation (SGG) aims to extract (subject, predicate, object) triplets in images.
Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding.
We propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a play-and-plug fashion and expanded to large SGG with 1,807 predicate classes.
arXiv Detail & Related papers (2022-03-22T12:26:56Z) - Semantic Compositional Learning for Low-shot Scene Graph Generation [122.51930904132685]
Many scene graph generation (SGG) models solely use the limited annotated relation triples for training.
We propose a novel semantic compositional learning strategy that makes it possible to construct additional, realistic relation triples.
For three recent SGG models, adding our strategy improves their performance by close to 50%, and all of them substantially exceed the current state-of-the-art.
arXiv Detail & Related papers (2021-08-19T10:13:55Z) - Weakly Supervised Visual Semantic Parsing [49.69377653925448]
Scene Graph Generation (SGG) aims to extract entities, predicates and their semantic structure from images.
Existing SGG methods require millions of manually annotated bounding boxes for training.
We propose Visual Semantic Parsing, VSPNet, and graph-based weakly supervised learning framework.
arXiv Detail & Related papers (2020-01-08T03:46:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.