Related papers: Scene Graph Modification as Incremental Structure Expanding

Scene Graph Modification as Incremental Structure Expanding

URL: http://arxiv.org/abs/2209.09093v1
Date: Thu, 15 Sep 2022 16:26:14 GMT
Title: Scene Graph Modification as Incremental Structure Expanding
Authors: Xuming Hu, Zhijiang Guo, Yu Fu, Lijie Wen, Philip S. Yu
Abstract summary: We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query. We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE) We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
Score: 61.84291817776118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A scene graph is a semantic representation that expresses the objects, attributes, and relationships between objects in a scene. Scene graphs play an important role in many cross modality tasks, as they are able to capture the interactions between images and texts. In this paper, we focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query. Unlike previous approaches that rebuilt the entire scene graph, we frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE). ISE constructs the target graph by incrementally expanding the source graph without changing the unmodified structure. Based on ISE, we further propose a model that iterates between nodes prediction and edges prediction, inferring more accurate and harmonious expansion decisions progressively. In addition, we construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets. Experiments on four benchmarks demonstrate the effectiveness of our approach, which surpasses the previous state-of-the-art model by large margins.

Related papers

FDSG: Forecasting Dynamic Scene Graphs [41.18167591493808]
We propose a novel framework that predicts future entity labels, bounding boxes, and relationships for unobserved frames.<n>A temporal aggregation module further refines predictions by integrating forecasted and observed information via crossattention.<n>Experiments on Action Genome show that FDSG outperforms state-of-the-art methods on dynamic scene graph generation, scene graph anticipation, and scene graph forecasting.
arXiv Detail & Related papers (2025-06-02T09:46:22Z)
Revisiting Graph Neural Networks on Graph-level Tasks: Comprehensive Experiments, Analysis, and Improvements [54.006506479865344]
We propose a unified evaluation framework for graph-level Graph Neural Networks (GNNs) This framework provides a standardized setting to evaluate GNNs across diverse datasets. We also propose a novel GNN model with enhanced expressivity and generalization capabilities.
arXiv Detail & Related papers (2025-01-01T08:48:53Z)
Joint Generative Modeling of Scene Graphs and Images via Diffusion Models [37.788957749123725]
We present a novel generative task: joint scene graph - image generation. We introduce a novel diffusion model, DiffuseSG, that jointly models the adjacency matrix along with heterogeneous node and edge attributes. With a graph transformer being the denoiser, DiffuseSG successively denoises the scene graph representation in a continuous space and discretizes the final representation to generate the clean scene graph.
arXiv Detail & Related papers (2024-01-02T10:10:29Z)
Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge [7.28830964611216]
This work introduces an enhanced approach to generating scene graphs by both a relationship hierarchy and commonsense knowledge. We implement a robust commonsense validation pipeline that harnesses foundation models to critique the results from the scene graph prediction system. Experiments on Visual Genome and OpenImage V6 datasets demonstrate that the proposed modules can be seamlessly integrated as plug-and-play enhancements to existing scene graph generation algorithms.
arXiv Detail & Related papers (2023-11-21T06:03:20Z)
Local-Global Information Interaction Debiasing for Dynamic Scene Graph Generation [51.92419880088668]
We propose a novel DynSGG model based on multi-task learning, DynSGG-MTL, which introduces the local interaction information and global human-action interaction information. Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
arXiv Detail & Related papers (2023-08-10T01:24:25Z)
Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video) Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible. We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z)
GEMS: Scene Expansion using Generative Models of Graphs [3.5998698847215165]
We focus on one such representation, scene graphs, and propose a novel scene expansion task. We first predict a new node and then predict the set of relationships between the newly predicted node and previous nodes in the graph. We conduct extensive experiments on Visual Genome and VRD datasets to evaluate the expanded scene graphs.
arXiv Detail & Related papers (2022-07-08T07:41:28Z)
Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs. We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z)
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization [77.21951145754065]
We propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph. Our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization.
arXiv Detail & Related papers (2020-08-04T08:25:24Z)
Iterative Context-Aware Graph Inference for Visual Dialog [126.016187323249]
We propose a novel Context-Aware Graph (CAG) neural network. Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations.
arXiv Detail & Related papers (2020-04-05T13:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.