Related papers: Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning

Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning

URL: http://arxiv.org/abs/2401.14626v1
Date: Fri, 26 Jan 2024 03:43:22 GMT
Title: Towards Lifelong Scene Graph Generation with Knowledge-ware In-context Prompt Learning
Authors: Tao He, Tongtong Wu, Dongyang Zhang, Guiduo Duan, Ke Qin, Yuan-Fang Li
Abstract summary: Scene graph generation (SGG) endeavors to predict visual relationships between pairs of objects within an image. This work seeks to address the pitfall inherent in a suite of prior relationship predictions. Motivated by the achievements of in-context learning in pretrained language models, our approach imbues the model with the capability to predict relationships.
Score: 24.98058940030532
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scene graph generation (SGG) endeavors to predict visual relationships between pairs of objects within an image. Prevailing SGG methods traditionally assume a one-off learning process for SGG. This conventional paradigm may necessitate repetitive training on all previously observed samples whenever new relationships emerge, mitigating the risk of forgetting previously acquired knowledge. This work seeks to address this pitfall inherent in a suite of prior relationship predictions. Motivated by the achievements of in-context learning in pretrained language models, our approach imbues the model with the capability to predict relationships and continuously acquire novel knowledge without succumbing to catastrophic forgetting. To achieve this goal, we introduce a novel and pragmatic framework for scene graph generation, namely Lifelong Scene Graph Generation (LSGG), where tasks, such as predicates, unfold in a streaming fashion. In this framework, the model is constrained to exclusive training on the present task, devoid of access to previously encountered training data, except for a limited number of exemplars, but the model is tasked with inferring all predicates it has encountered thus far. Rigorous experiments demonstrate the superiority of our proposed method over state-of-the-art SGG models in the context of LSGG across a diverse array of metrics. Besides, extensive experiments on the two mainstream benchmark datasets, VG and Open-Image(v6), show the superiority of our proposed model to a number of competitive SGG models in terms of continuous learning and conventional settings. Moreover, comprehensive ablation experiments demonstrate the effectiveness of each component in our model.

Related papers

Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees [50.78679002846741]
We introduce a novel approach for learning cross-task generalities in graphs. We propose task-trees as basic learning instances to align task spaces on graphs. Our findings indicate that when a graph neural network is pretrained on diverse task-trees, it acquires transferable knowledge.
arXiv Detail & Related papers (2024-12-21T02:07:43Z)
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation [10.678727237318503]
Impar, a novel training framework that leverages curriculum learning and loss masking to mitigate bias generation and anticipation modelling. We introduce two new tasks, Robust Spatio-Temporal Scene Graph Generation and Robust Scene Graph Anticipation, designed to evaluate the robustness of STSG models against distribution shifts.
arXiv Detail & Related papers (2024-11-20T06:15:28Z)
Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency [3.351553095054309]
Scene graph generation (SGG) represents the relationships between objects in an image as a graph structure. Previous studies have failed to reflect the co-occurrence of objects during SGG generation. We propose CooK, which reflects the Co-occurrence Knowledge between objects, and the learnable term frequency-inverse document frequency.
arXiv Detail & Related papers (2024-05-21T09:56:48Z)
Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset. We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding. Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z)
Local-Global Information Interaction Debiasing for Dynamic Scene Graph Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information. Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z)
Unbiased Scene Graph Generation in Videos [36.889659781604564]
We introduce TEMPURA: TEmporal consistency and Memory-guided UnceRtainty Attenuation for unbiased dynamic SGG. TEMPURA employs object-level temporal consistencies via transformer sequence modeling, learns to synthesize unbiased relationship representations. Our method achieves significant (up to 10% in some cases) performance gain over existing methods.
arXiv Detail & Related papers (2023-04-03T06:10:06Z)
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World [67.03968403301143]
Scene Graph Generation (SGG) aims to extract subject, predicate, object> relationships in images for vision understanding. Existing re-balancing strategies try to handle it via prior rules but are still confined to pre-defined conditions. We propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates.
arXiv Detail & Related papers (2023-03-23T13:06:38Z)
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image. We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes. Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z)
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories. We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG. Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z)
Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration [9.203403318435486]
We propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for scene graph generation (SGG) Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph.
arXiv Detail & Related papers (2021-07-11T16:22:45Z)
Exploring the Limits of Few-Shot Link Prediction in Knowledge Graphs [49.6661602019124]
We study a spectrum of models derived by generalizing the current state of the art for few-shot link prediction. We find that a simple zero-shot baseline - which ignores any relation-specific information - achieves surprisingly strong performance. Experiments on carefully crafted synthetic datasets show that having only a few examples of a relation fundamentally limits models from using fine-grained structural information.
arXiv Detail & Related papers (2021-02-05T21:04:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.