Scene Graph Modification as Incremental Structure Expanding
- URL: http://arxiv.org/abs/2209.09093v1
- Date: Thu, 15 Sep 2022 16:26:14 GMT
- Title: Scene Graph Modification as Incremental Structure Expanding
- Authors: Xuming Hu, Zhijiang Guo, Yu Fu, Lijie Wen, Philip S. Yu
- Abstract summary: We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
- Score: 61.84291817776118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A scene graph is a semantic representation that expresses the objects,
attributes, and relationships between objects in a scene. Scene graphs play an
important role in many cross modality tasks, as they are able to capture the
interactions between images and texts. In this paper, we focus on scene graph
modification (SGM), where the system is required to learn how to update an
existing scene graph based on a natural language query. Unlike previous
approaches that rebuilt the entire scene graph, we frame SGM as a graph
expansion task by introducing the incremental structure expanding (ISE). ISE
constructs the target graph by incrementally expanding the source graph without
changing the unmodified structure. Based on ISE, we further propose a model
that iterates between nodes prediction and edges prediction, inferring more
accurate and harmonious expansion decisions progressively. In addition, we
construct a challenging dataset that contains more complicated queries and
larger scene graphs than existing datasets. Experiments on four benchmarks
demonstrate the effectiveness of our approach, which surpasses the previous
state-of-the-art model by large margins.
Related papers
- Joint Generative Modeling of Scene Graphs and Images via Diffusion
Models [37.788957749123725]
We present a novel generative task: joint scene graph - image generation.
We introduce a novel diffusion model, DiffuseSG, that jointly models the adjacency matrix along with heterogeneous node and edge attributes.
With a graph transformer being the denoiser, DiffuseSG successively denoises the scene graph representation in a continuous space and discretizes the final representation to generate the clean scene graph.
arXiv Detail & Related papers (2024-01-02T10:10:29Z) - Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge [7.28830964611216]
This work introduces an enhanced approach to generating scene graphs by both a relationship hierarchy and commonsense knowledge.
We implement a robust commonsense validation pipeline that harnesses foundation models to critique the results from the scene graph prediction system.
Experiments on Visual Genome and OpenImage V6 datasets demonstrate that the proposed modules can be seamlessly integrated as plug-and-play enhancements to existing scene graph generation algorithms.
arXiv Detail & Related papers (2023-11-21T06:03:20Z) - Local-Global Information Interaction Debiasing for Dynamic Scene Graph
Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information.
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z) - Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video)
Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible.
We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z) - GEMS: Scene Expansion using Generative Models of Graphs [3.5998698847215165]
We focus on one such representation, scene graphs, and propose a novel scene expansion task.
We first predict a new node and then predict the set of relationships between the newly predicted node and previous nodes in the graph.
We conduct extensive experiments on Visual Genome and VRD datasets to evaluate the expanded scene graphs.
arXiv Detail & Related papers (2022-07-08T07:41:28Z) - Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs.
We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z) - Jointly Cross- and Self-Modal Graph Attention Network for Query-Based
Moment Localization [77.21951145754065]
We propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.
Our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization.
arXiv Detail & Related papers (2020-08-04T08:25:24Z) - Iterative Context-Aware Graph Inference for Visual Dialog [126.016187323249]
We propose a novel Context-Aware Graph (CAG) neural network.
Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations.
arXiv Detail & Related papers (2020-04-05T13:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.