Segmentation-grounded Scene Graph Generation
- URL: http://arxiv.org/abs/2104.14207v1
- Date: Thu, 29 Apr 2021 08:54:08 GMT
- Title: Segmentation-grounded Scene Graph Generation
- Authors: Siddhesh Khandelwal, Mohammed Suhail, Leonid Sigal
- Abstract summary: We propose a framework for pixel-level segmentation-grounded scene graph generation.
Our framework is agnostic to the underlying scene graph generation method.
It is learned in a multi-task manner with both target and auxiliary datasets.
- Score: 47.34166260639392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene graph generation has emerged as an important problem in computer
vision. While scene graphs provide a grounded representation of objects, their
locations and relations in an image, they do so only at the granularity of
proposal bounding boxes. In this work, we propose the first, to our knowledge,
framework for pixel-level segmentation-grounded scene graph generation. Our
framework is agnostic to the underlying scene graph generation method and
address the lack of segmentation annotations in target scene graph datasets
(e.g., Visual Genome) through transfer and multi-task learning from, and with,
an auxiliary dataset (e.g., MS COCO). Specifically, each target object being
detected is endowed with a segmentation mask, which is expressed as a
lingual-similarity weighted linear combination over categories that have
annotations present in an auxiliary dataset. These inferred masks, along with a
novel Gaussian attention mechanism which grounds the relations at a pixel-level
within the image, allow for improved relation prediction. The entire framework
is end-to-end trainable and is learned in a multi-task manner with both target
and auxiliary datasets.
Related papers
- Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient
Scene Graph Generation [0.7851536646859476]
We introduce the task of Efficient Scene Graph Generation (SGG) that prioritizes the generation of relevant relations.
We present a new dataset, VG150-curated, based on the annotations of the popular Visual Genome dataset.
We show through a set of experiments that this dataset contains more high-quality and diverse annotations than the one usually use in SGG.
arXiv Detail & Related papers (2023-05-30T00:55:49Z) - Location-Free Scene Graph Generation [45.366540803729386]
Scene Graph Generation (SGG) is a visual understanding task, aiming to describe a scene as a graph of entities and their relationships with each other.
Existing works rely on location labels in form of bounding boxes or segmentation masks, increasing annotation costs and limiting dataset expansion.
We break this dependency and introduce location-free scene graph generation (LF-SGG)
This new task aims at predicting instances of entities, as well as their relationships, without the explicit calculation of their spatial localization.
arXiv Detail & Related papers (2023-03-20T08:57:45Z) - Iterative Scene Graph Generation with Generative Transformers [6.243995448840211]
Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format.
Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene.
This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction.
arXiv Detail & Related papers (2022-11-30T00:05:44Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Grounding Scene Graphs on Natural Images via Visio-Lingual Message
Passing [17.63475613154152]
This paper presents a framework for jointly grounding objects that follow certain semantic relationship constraints in a scene graph.
A scene graph is an efficient and structured way to represent all the objects and their semantic relationships in the image.
arXiv Detail & Related papers (2022-11-03T16:46:46Z) - Image Semantic Relation Generation [0.76146285961466]
Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
arXiv Detail & Related papers (2022-10-19T16:15:19Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video)
Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible.
We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs.
We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.