Image-to-Image Retrieval by Learning Similarity between Scene Graphs
- URL: http://arxiv.org/abs/2012.14700v1
- Date: Tue, 29 Dec 2020 10:45:20 GMT
- Title: Image-to-Image Retrieval by Learning Similarity between Scene Graphs
- Authors: Sangwoong Yoon, Woo Young Kang, Sungwook Jeon, SeongEun Lee, Changjin
Han, Jonghun Park, Eun-Sol Kim
- Abstract summary: We propose a novel approach for image-to-image retrieval using scene graph similarity measured by graph neural networks.
In our approach, graph neural networks are trained to predict the proxy image relevance measure, computed from human-annotated captions.
- Score: 5.284353899197193
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As a scene graph compactly summarizes the high-level content of an image in a
structured and symbolic manner, the similarity between scene graphs of two
images reflects the relevance of their contents. Based on this idea, we propose
a novel approach for image-to-image retrieval using scene graph similarity
measured by graph neural networks. In our approach, graph neural networks are
trained to predict the proxy image relevance measure, computed from
human-annotated captions using a pre-trained sentence similarity model. We
collect and publish the dataset for image relevance measured by human
annotators to evaluate retrieval algorithms. The collected dataset shows that
our method agrees well with the human perception of image similarity than other
competitive baselines.
Related papers
- Interpretable Measures of Conceptual Similarity by
Complexity-Constrained Descriptive Auto-Encoding [112.0878081944858]
Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning.
We seek to define and compute a notion of "conceptual similarity" among images that captures high-level relations.
Two highly dissimilar images can be discriminated early in their description, whereas conceptually dissimilar ones will need more detail to be distinguished.
arXiv Detail & Related papers (2024-02-14T03:31:17Z) - Patch-wise Graph Contrastive Learning for Image Translation [69.85040887753729]
We exploit the graph neural network to capture the topology-aware features.
We construct the graph based on the patch-wise similarity from a pretrained encoder.
In order to capture the hierarchical semantic structure, we propose the graph pooling.
arXiv Detail & Related papers (2023-12-13T15:45:19Z) - Learning an Adaptation Function to Assess Image Visual Similarities [0.0]
We focus here on the specific task of learning visual image similarities when analogy matters.
We propose to compare different supervised, semi-supervised and self-supervised networks, pre-trained on distinct scales and contents datasets.
Our experiments conducted on the Totally Looks Like image dataset highlight the interest of our method, by increasing the retrieval scores of the best model @1 by 2.25x.
arXiv Detail & Related papers (2022-06-03T07:15:00Z) - Image Keypoint Matching using Graph Neural Networks [22.33342295278866]
We propose a graph neural network for the problem of image matching.
The proposed method first generates initial soft correspondences between keypoints using localized node embeddings.
We evaluate our method on natural image datasets with keypoint annotations and show that, in comparison to a state-of-the-art model, our method speeds up inference times without sacrificing prediction accuracy.
arXiv Detail & Related papers (2022-05-27T23:38:44Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Enhancing Social Relation Inference with Concise Interaction Graph and
Discriminative Scene Representation [56.25878966006678]
We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE)
It concisely learns interactive features of persons and discriminative features of holistic scenes.
PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
arXiv Detail & Related papers (2021-07-30T04:20:13Z) - A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval [4.159666152160874]
Scene graph presentation is a suitable method for the image-text matching challenge.
We introduce the Local and Global Scene Graph Matching (LGSGM) model that enhances the state-of-the-art method.
Our enhancement with the combination of levels can improve the performance of the baseline method by increasing the recall by more than 10% on the Flickr30k dataset.
arXiv Detail & Related papers (2021-06-04T10:33:14Z) - A Novel Graph-Theoretic Deep Representation Learning Method for
Multi-Label Remote Sensing Image Retrieval [0.0]
This paper presents a novel graph-theoretic deep representation learning method in the framework of multi-label remote sensing (RS) image retrieval problems.
The proposed method aims to extract and exploit multi-label co-occurrence relationships associated to each RS image in the archive.
The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/GT-DRL-CBIR.
arXiv Detail & Related papers (2021-06-01T14:11:08Z) - Scene Graph Embeddings Using Relative Similarity Supervision [4.137464623395376]
We employ a graph convolutional network to exploit structure in scene graphs and produce image embeddings useful for semantic image retrieval.
We propose a novel loss function that operates on pairs of similar and dissimilar images and imposes relative ordering between them in embedding space.
We demonstrate that this Ranking loss, coupled with an intuitive triple sampling strategy, leads to robust representations that outperform well-known contrastive losses on the retrieval task.
arXiv Detail & Related papers (2021-04-06T09:13:05Z) - Using Text to Teach Image Retrieval [47.72498265721957]
We build on the concept of image manifold to represent the feature space of images, learned via neural networks, as a graph.
We augment the manifold samples with geometrically aligned text, thereby using a plethora of sentences to teach us about images.
The experimental results show that the joint embedding manifold is a robust representation, allowing it to be a better basis to perform image retrieval.
arXiv Detail & Related papers (2020-11-19T16:09:14Z) - Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation [98.34909905511061]
We argue that a desirable scene graph should be hierarchically constructed, and introduce a new scheme for modeling scene graph.
To generate a scene graph based on HET, we parse HET with a Hybrid Long Short-Term Memory (Hybrid-LSTM) which specifically encodes hierarchy and siblings context.
To further prioritize key relations in the scene graph, we devise a Relation Ranking Module (RRM) to dynamically adjust their rankings.
arXiv Detail & Related papers (2020-07-17T05:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.