Adaptive Visual Scene Understanding: Incremental Scene Graph Generation
- URL: http://arxiv.org/abs/2310.01636v4
- Date: Fri, 01 Nov 2024 05:29:34 GMT
- Title: Adaptive Visual Scene Understanding: Incremental Scene Graph Generation
- Authors: Naitik Khandelwal, Xiao Liu, Mengmi Zhang,
- Abstract summary: Scene graph generation (SGG) analyzes images to extract meaningful information about objects and their relationships.
We present a benchmark comprising three learning regimes: relationship incremental, scene incremental, and relationship generalization.
We also introduce a Replays via Analysis by Synthesis" method named RAS.
- Score: 18.541428517746034
- License:
- Abstract: Scene graph generation (SGG) analyzes images to extract meaningful information about objects and their relationships. In the dynamic visual world, it is crucial for AI systems to continuously detect new objects and establish their relationships with existing ones. Recently, numerous studies have focused on continual learning within the domains of object detection and image recognition. However, a limited amount of research focuses on a more challenging continual learning problem in SGG. This increased difficulty arises from the intricate interactions and dynamic relationships among objects, and their associated contexts. Thus, in continual learning, SGG models are often required to expand, modify, retain, and reason scene graphs within the process of adaptive visual scene understanding. To systematically explore Continual Scene Graph Generation (CSEGG), we present a comprehensive benchmark comprising three learning regimes: relationship incremental, scene incremental, and relationship generalization. Moreover, we introduce a ``Replays via Analysis by Synthesis" method named RAS. This approach leverages the scene graphs, decomposes and re-composes them to represent different scenes, and replays the synthesized scenes based on these compositional scene graphs. The replayed synthesized scenes act as a means to practice and refine proficiency in SGG in known and unknown environments. Our experimental results not only highlight the challenges of directly combining existing continual learning methods with SGG backbones but also demonstrate the effectiveness of our proposed approach, enhancing CSEGG efficiency while simultaneously preserving privacy and memory usage. All data and source code are publicly available online.
Related papers
- Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency [3.351553095054309]
Scene graph generation (SGG) represents the relationships between objects in an image as a graph structure.
Previous studies have failed to reflect the co-occurrence of objects during SGG generation.
We propose CooK, which reflects the Co-occurrence Knowledge between objects, and the learnable term frequency-inverse document frequency.
arXiv Detail & Related papers (2024-05-21T09:56:48Z) - Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention [69.36723767339001]
Scene Graph Generation (SGG) offers a structured representation critical in many computer vision applications.
We propose a unified framework named OvSGTR towards fully open vocabulary SGG from a holistic view.
For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pretraining.
arXiv Detail & Related papers (2023-11-18T06:49:17Z) - Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - Unbiased Heterogeneous Scene Graph Generation with Relation-aware
Message Passing Neural Network [9.779600950401315]
We propose an unbiased heterogeneous scene graph generation (HetSGG) framework that captures relation-aware context.
We devise a novel message passing layer, called relation-aware message passing neural network (RMP), that aggregates the contextual information of an image.
arXiv Detail & Related papers (2022-12-01T11:25:36Z) - Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z) - Scene Graph Generation: A Comprehensive Survey [35.80909746226258]
Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding.
Scene Graph Generation (SGG) refers to the task of automatically mapping an image into a semantic structural scene graph.
We review 138 representative works that cover different input modalities, and systematically summarize existing methods of image-based SGG.
arXiv Detail & Related papers (2022-01-03T00:55:33Z) - Not All Relations are Equal: Mining Informative Labels for Scene Graph
Generation [48.21846438269506]
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects.
Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data.
We propose a novel framework for SGG training that exploits relation labels based on their informativeness.
arXiv Detail & Related papers (2021-11-26T14:34:12Z) - Semantic Compositional Learning for Low-shot Scene Graph Generation [122.51930904132685]
Many scene graph generation (SGG) models solely use the limited annotated relation triples for training.
We propose a novel semantic compositional learning strategy that makes it possible to construct additional, realistic relation triples.
For three recent SGG models, adding our strategy improves their performance by close to 50%, and all of them substantially exceed the current state-of-the-art.
arXiv Detail & Related papers (2021-08-19T10:13:55Z) - Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs.
We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z) - Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge
Integration [9.203403318435486]
We propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for scene graph generation (SGG)
Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph.
arXiv Detail & Related papers (2021-07-11T16:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.