Image Semantic Relation Generation
- URL: http://arxiv.org/abs/2210.11253v1
- Date: Wed, 19 Oct 2022 16:15:19 GMT
- Title: Image Semantic Relation Generation
- Authors: Mingzhe Du
- Abstract summary: Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
- Score: 0.76146285961466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene graphs provide structured semantic understanding beyond images. For
downstream tasks, such as image retrieval, visual question answering, visual
relationship detection, and even autonomous vehicle technology, scene graphs
can not only distil complex image information but also correct the bias of
visual models using semantic-level relations, which has broad application
prospects. However, the heavy labour cost of constructing graph annotations may
hinder the application of PSG in practical scenarios. Inspired by the
observation that people usually identify the subject and object first and then
determine the relationship between them, we proposed to decouple the scene
graphs generation task into two sub-tasks: 1) an image segmentation task to
pick up the qualified objects. 2) a restricted auto-regressive text generation
task to generate the relation between given objects. Therefore, in this work,
we introduce image semantic relation generation (ISRG), a simple but effective
image-to-text model, which achieved 31 points on the OpenPSG dataset and
outperforms strong baselines respectively by 16 points (ResNet-50) and 5 points
(CLIP).
Related papers
- Composing Object Relations and Attributes for Image-Text Matching [70.47747937665987]
This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges.
Our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system.
arXiv Detail & Related papers (2024-06-17T17:56:01Z) - Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient
Scene Graph Generation [0.7851536646859476]
We introduce the task of Efficient Scene Graph Generation (SGG) that prioritizes the generation of relevant relations.
We present a new dataset, VG150-curated, based on the annotations of the popular Visual Genome dataset.
We show through a set of experiments that this dataset contains more high-quality and diverse annotations than the one usually use in SGG.
arXiv Detail & Related papers (2023-05-30T00:55:49Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Grounding Scene Graphs on Natural Images via Visio-Lingual Message
Passing [17.63475613154152]
This paper presents a framework for jointly grounding objects that follow certain semantic relationship constraints in a scene graph.
A scene graph is an efficient and structured way to represent all the objects and their semantic relationships in the image.
arXiv Detail & Related papers (2022-11-03T16:46:46Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z) - Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs.
We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z) - A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval [4.159666152160874]
Scene graph presentation is a suitable method for the image-text matching challenge.
We introduce the Local and Global Scene Graph Matching (LGSGM) model that enhances the state-of-the-art method.
Our enhancement with the combination of levels can improve the performance of the baseline method by increasing the recall by more than 10% on the Flickr30k dataset.
arXiv Detail & Related papers (2021-06-04T10:33:14Z) - Segmentation-grounded Scene Graph Generation [47.34166260639392]
We propose a framework for pixel-level segmentation-grounded scene graph generation.
Our framework is agnostic to the underlying scene graph generation method.
It is learned in a multi-task manner with both target and auxiliary datasets.
arXiv Detail & Related papers (2021-04-29T08:54:08Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.