Location-Free Scene Graph Generation
- URL: http://arxiv.org/abs/2303.10944v1
- Date: Mon, 20 Mar 2023 08:57:45 GMT
- Title: Location-Free Scene Graph Generation
- Authors: Ege \"Ozsoy, Felix Holm, Tobias Czempiel, Nassir Navab, Benjamin Busam
- Abstract summary: Scene Graph Generation (SGG) is a challenging visual understanding task.
It combines the detection of entities and relationships between them in a scene.
The need for localization labels significantly increases the annotation cost and hampers the creation of more and larger scene graph datasets.
We suggest breaking the dependency of scene graphs on bounding box labels by proposing location-free scene graph generation.
- Score: 43.68679886516574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene Graph Generation (SGG) is a challenging visual understanding task. It
combines the detection of entities and relationships between them in a scene.
Both previous works and existing evaluation metrics rely on bounding box
labels, even though many downstream scene graph applications do not need
location information. The need for localization labels significantly increases
the annotation cost and hampers the creation of more and larger scene graph
datasets. We suggest breaking the dependency of scene graphs on bounding box
labels by proposing location-free scene graph generation (LF-SGG). This new
task aims at predicting instances of entities, as well as their relationships,
without spatial localization. To objectively evaluate the task, the predicted
and ground truth scene graphs need to be compared. We solve this NP-hard
problem through an efficient algorithm using branching. Additionally, we design
the first LF-SGG method, Pix2SG, using autoregressive sequence modeling. Our
proposed method is evaluated on Visual Genome and 4D-OR. Although using
significantly fewer labels during training, we achieve 74.12\% of the
location-supervised SOTA performance on Visual Genome and even outperform the
best method on 4D-OR.
Related papers
- Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient
Scene Graph Generation [0.7851536646859476]
We introduce the task of Efficient Scene Graph Generation (SGG) that prioritizes the generation of relevant relations.
We present a new dataset, VG150-curated, based on the annotations of the popular Visual Genome dataset.
We show through a set of experiments that this dataset contains more high-quality and diverse annotations than the one usually use in SGG.
arXiv Detail & Related papers (2023-05-30T00:55:49Z) - Iterative Scene Graph Generation with Generative Transformers [6.243995448840211]
Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format.
Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene.
This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction.
arXiv Detail & Related papers (2022-11-30T00:05:44Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object
Detection [26.0630601028093]
Domain Adaptive Object Detection (DAOD) leverages a labeled domain to learn an object detector generalizing to a novel domain free of annotations.
Recent advances align class-conditional distributions by narrowing down cross-domain prototypes (class centers)
We propose a novel SemantIc-complete Graph MAtching framework for hallucinationD, which completes mismatched semantics and reformulates the adaptation with graph matching.
arXiv Detail & Related papers (2022-03-12T10:14:17Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Learnable Graph Matching: Incorporating Graph Partitioning with Deep
Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task.
Existing methods mostly ignore the context information among tracklets and intra-frame detections.
We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z) - Fully Convolutional Scene Graph Generation [30.194961716870186]
This paper presents a fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously.
FCSGG encodes objects as bounding box center points, and relationships as 2D vector fields which are named as Relation Affinity Fields (RAFs)
FCSGG achieves highly competitive results on recall and zero-shot recall with significantly reduced inference time.
arXiv Detail & Related papers (2021-03-30T05:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.