Location-Free Scene Graph Generation
- URL: http://arxiv.org/abs/2303.10944v1
- Date: Mon, 20 Mar 2023 08:57:45 GMT
- Title: Location-Free Scene Graph Generation
- Authors: Ege \"Ozsoy, Felix Holm, Tobias Czempiel, Nassir Navab, Benjamin Busam
- Abstract summary: Scene Graph Generation (SGG) is a challenging visual understanding task.
It combines the detection of entities and relationships between them in a scene.
The need for localization labels significantly increases the annotation cost and hampers the creation of more and larger scene graph datasets.
We suggest breaking the dependency of scene graphs on bounding box labels by proposing location-free scene graph generation.
- Score: 43.68679886516574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene Graph Generation (SGG) is a challenging visual understanding task. It
combines the detection of entities and relationships between them in a scene.
Both previous works and existing evaluation metrics rely on bounding box
labels, even though many downstream scene graph applications do not need
location information. The need for localization labels significantly increases
the annotation cost and hampers the creation of more and larger scene graph
datasets. We suggest breaking the dependency of scene graphs on bounding box
labels by proposing location-free scene graph generation (LF-SGG). This new
task aims at predicting instances of entities, as well as their relationships,
without spatial localization. To objectively evaluate the task, the predicted
and ground truth scene graphs need to be compared. We solve this NP-hard
problem through an efficient algorithm using branching. Additionally, we design
the first LF-SGG method, Pix2SG, using autoregressive sequence modeling. Our
proposed method is evaluated on Visual Genome and 4D-OR. Although using
significantly fewer labels during training, we achieve 74.12\% of the
location-supervised SOTA performance on Visual Genome and even outperform the
best method on 4D-OR.
Related papers
- A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) [5.37125692728042]
SceneGuided RetrieveRwR is a framework for reasoning and planning with graphs.
We show that our framework surpasses existing LLM-based approaches in numerical Q&A and planning tasks.
arXiv Detail & Related papers (2025-02-05T18:50:38Z) - Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - Joint Generative Modeling of Scene Graphs and Images via Diffusion
Models [37.788957749123725]
We present a novel generative task: joint scene graph - image generation.
We introduce a novel diffusion model, DiffuseSG, that jointly models the adjacency matrix along with heterogeneous node and edge attributes.
With a graph transformer being the denoiser, DiffuseSG successively denoises the scene graph representation in a continuous space and discretizes the final representation to generate the clean scene graph.
arXiv Detail & Related papers (2024-01-02T10:10:29Z) - Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient
Scene Graph Generation [0.7851536646859476]
We introduce the task of Efficient Scene Graph Generation (SGG) that prioritizes the generation of relevant relations.
We present a new dataset, VG150-curated, based on the annotations of the popular Visual Genome dataset.
We show through a set of experiments that this dataset contains more high-quality and diverse annotations than the one usually use in SGG.
arXiv Detail & Related papers (2023-05-30T00:55:49Z) - Iterative Scene Graph Generation with Generative Transformers [6.243995448840211]
Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format.
Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene.
This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction.
arXiv Detail & Related papers (2022-11-30T00:05:44Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video)
Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible.
We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z) - Segmentation-grounded Scene Graph Generation [47.34166260639392]
We propose a framework for pixel-level segmentation-grounded scene graph generation.
Our framework is agnostic to the underlying scene graph generation method.
It is learned in a multi-task manner with both target and auxiliary datasets.
arXiv Detail & Related papers (2021-04-29T08:54:08Z) - Fully Convolutional Scene Graph Generation [30.194961716870186]
This paper presents a fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously.
FCSGG encodes objects as bounding box center points, and relationships as 2D vector fields which are named as Relation Affinity Fields (RAFs)
FCSGG achieves highly competitive results on recall and zero-shot recall with significantly reduced inference time.
arXiv Detail & Related papers (2021-03-30T05:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.