Structured Query-Based Image Retrieval Using Scene Graphs
- URL: http://arxiv.org/abs/2005.06653v1
- Date: Wed, 13 May 2020 22:40:32 GMT
- Title: Structured Query-Based Image Retrieval Using Scene Graphs
- Authors: Brigit Schroeder, Subarna Tripathi
- Abstract summary: We present a method which uses scene graph embeddings as the basis for an approach to image retrieval.
We are able to achieve high recall even on low to medium frequency objects found in the long-tailed COCO-Stuff dataset.
- Score: 10.475553340127394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A structured query can capture the complexity of object interactions (e.g.
'woman rides motorcycle') unlike single objects (e.g. 'woman' or 'motorcycle').
Retrieval using structured queries therefore is much more useful than single
object retrieval, but a much more challenging problem. In this paper we present
a method which uses scene graph embeddings as the basis for an approach to
image retrieval. We examine how visual relationships, derived from scene
graphs, can be used as structured queries. The visual relationships are
directed subgraphs of the scene graph with a subject and object as nodes
connected by a predicate relationship. Notably, we are able to achieve high
recall even on low to medium frequency objects found in the long-tailed
COCO-Stuff dataset, and find that adding a visual relationship-inspired loss
boosts our recall by 10% in the best case.
Related papers
- Composing Object Relations and Attributes for Image-Text Matching [70.47747937665987]
This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges.
Our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system.
arXiv Detail & Related papers (2024-06-17T17:56:01Z) - DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation [13.058196732927135]
Scene graph generation aims to capture detailed spatial and semantic relationships between objects in an image.
Existing Transformer-based methods either employ distinct queries for objects and predicates or utilize holistic queries for relation triplets.
We present a new Transformer-based method, called DSGG, that views scene graph detection as a direct graph prediction problem.
arXiv Detail & Related papers (2024-03-21T23:43:30Z) - Image Semantic Relation Generation [0.76146285961466]
Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
arXiv Detail & Related papers (2022-10-19T16:15:19Z) - ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy [65.5580334698777]
ViRel is a method for unsupervised discovery and learning of Visual Relations with graph-level analogy.
We show that our method achieves above 95% accuracy in relation classification.
We further generalizes to unseen tasks with more complicated relational structures.
arXiv Detail & Related papers (2022-07-04T16:56:45Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - One-shot Scene Graph Generation [130.57405850346836]
We propose Multiple Structured Knowledge (Relational Knowledgesense Knowledge) for the one-shot scene graph generation task.
Our method significantly outperforms existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-02-22T11:32:59Z) - Learning to Compose Visual Relations [100.45138490076866]
We propose to represent each relation as an unnormalized density (an energy-based model)
We show that such a factorized decomposition allows the model to both generate and edit scenes with multiple sets of relations more faithfully.
arXiv Detail & Related papers (2021-11-17T18:51:29Z) - Few-shot Visual Relationship Co-localization [1.4130726713527195]
Given a small bag of images, each containing a common but latent predicate, we are interested in localizing visual subject-object pairs connected via the common predicate in each of the images.
We propose an optimization framework to select a common visual relationship in each image of the bag.
We extensively evaluate our proposed framework on variations of bag sizes obtained from two challenging public datasets.
arXiv Detail & Related papers (2021-08-26T07:19:57Z) - Scenes and Surroundings: Scene Graph Generation using Relation
Transformer [13.146732454123326]
This work proposes a novel local-context aware architecture named relation transformer.
Our hierarchical multi-head attention-based approach efficiently captures contextual dependencies between objects and predicts their relationships.
In comparison to state-of-the-art approaches, we have achieved an overall mean textbf4.85% improvement.
arXiv Detail & Related papers (2021-07-12T14:22:20Z) - ORD: Object Relationship Discovery for Visual Dialogue Generation [60.471670447176656]
We propose an object relationship discovery (ORD) framework to preserve the object interactions for visual dialogue generation.
A hierarchical graph convolutional network (HierGCN) is proposed to retain the object nodes and neighbour relationships locally, and then refines the object-object connections globally.
Experiments have proved that the proposed method can significantly improve the quality of dialogue by utilising the contextual information of visual relationships.
arXiv Detail & Related papers (2020-06-15T12:25:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.