Towards Lifelong Scene Graph Generation with Knowledge-ware In-context
Prompt Learning
- URL: http://arxiv.org/abs/2401.14626v1
- Date: Fri, 26 Jan 2024 03:43:22 GMT
- Title: Towards Lifelong Scene Graph Generation with Knowledge-ware In-context
Prompt Learning
- Authors: Tao He, Tongtong Wu, Dongyang Zhang, Guiduo Duan, Ke Qin, Yuan-Fang Li
- Abstract summary: Scene graph generation (SGG) endeavors to predict visual relationships between pairs of objects within an image.
This work seeks to address the pitfall inherent in a suite of prior relationship predictions.
Motivated by the achievements of in-context learning in pretrained language models, our approach imbues the model with the capability to predict relationships.
- Score: 24.98058940030532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene graph generation (SGG) endeavors to predict visual relationships
between pairs of objects within an image. Prevailing SGG methods traditionally
assume a one-off learning process for SGG. This conventional paradigm may
necessitate repetitive training on all previously observed samples whenever new
relationships emerge, mitigating the risk of forgetting previously acquired
knowledge. This work seeks to address this pitfall inherent in a suite of prior
relationship predictions. Motivated by the achievements of in-context learning
in pretrained language models, our approach imbues the model with the
capability to predict relationships and continuously acquire novel knowledge
without succumbing to catastrophic forgetting. To achieve this goal, we
introduce a novel and pragmatic framework for scene graph generation, namely
Lifelong Scene Graph Generation (LSGG), where tasks, such as predicates, unfold
in a streaming fashion. In this framework, the model is constrained to
exclusive training on the present task, devoid of access to previously
encountered training data, except for a limited number of exemplars, but the
model is tasked with inferring all predicates it has encountered thus far.
Rigorous experiments demonstrate the superiority of our proposed method over
state-of-the-art SGG models in the context of LSGG across a diverse array of
metrics. Besides, extensive experiments on the two mainstream benchmark
datasets, VG and Open-Image(v6), show the superiority of our proposed model to
a number of competitive SGG models in terms of continuous learning and
conventional settings. Moreover, comprehensive ablation experiments demonstrate
the effectiveness of each component in our model.
Related papers
- Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency [3.351553095054309]
Scene graph generation (SGG) represents the relationships between objects in an image as a graph structure.
Previous studies have failed to reflect the co-occurrence of objects during SGG generation.
We propose CooK, which reflects the Co-occurrence Knowledge between objects, and the learnable term frequency-inverse document frequency.
arXiv Detail & Related papers (2024-05-21T09:56:48Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - Unbiased Scene Graph Generation in Videos [36.889659781604564]
We introduce TEMPURA: TEmporal consistency and Memory-guided UnceRtainty Attenuation for unbiased dynamic SGG.
TEMPURA employs object-level temporal consistencies via transformer sequence modeling, learns to synthesize unbiased relationship representations.
Our method achieves significant (up to 10% in some cases) performance gain over existing methods.
arXiv Detail & Related papers (2023-04-03T06:10:06Z) - Visually-Prompted Language Model for Fine-Grained Scene Graph Generation
in an Open World [67.03968403301143]
Scene Graph Generation (SGG) aims to extract subject, predicate, object> relationships in images for vision understanding.
Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions.
We propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates.
arXiv Detail & Related papers (2023-03-23T13:06:38Z) - Decomposed Prototype Learning for Few-Shot Scene Graph Generation [28.796734816086065]
We focus on a new promising task of scene graph generation (SGG): few-shot SGG (FSSGG)
FSSGG encourages models to be able to quickly transfer previous knowledge and recognize novel predicates with only a few examples.
We propose a novel Decomposed Prototype Learning (DPL)
arXiv Detail & Related papers (2023-03-20T04:54:26Z) - Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z) - Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge
Integration [9.203403318435486]
We propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for scene graph generation (SGG)
Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph.
arXiv Detail & Related papers (2021-07-11T16:22:45Z) - Exploring the Limits of Few-Shot Link Prediction in Knowledge Graphs [49.6661602019124]
We study a spectrum of models derived by generalizing the current state of the art for few-shot link prediction.
We find that a simple zero-shot baseline - which ignores any relation-specific information - achieves surprisingly strong performance.
Experiments on carefully crafted synthetic datasets show that having only a few examples of a relation fundamentally limits models from using fine-grained structural information.
arXiv Detail & Related papers (2021-02-05T21:04:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.