Related papers: Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval

Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval

URL: http://arxiv.org/abs/2308.08431v1
Date: Wed, 16 Aug 2023 15:23:14 GMT
Title: Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval
Authors: Aishwarya Venkataramanan and Martin Laviale and C\'edric Pradalier
Abstract summary: We propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy. The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification. Our method achieves superior performance compared to the existing methods on image retrieval.
Score: 0.46040036610482665
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most of the research in content-based image retrieval (CBIR) focus on developing robust feature representations that can effectively retrieve instances from a database of images that are visually similar to a query. However, the retrieved images sometimes contain results that are not semantically related to the query. To address this, we propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy. The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification, assuming that overlapping classes share high visual and semantic similarities. Finally, the constructed hierarchy is integrated into the distance calculation metric for similarity search. Experiments on standard datasets: CUB-200-2011 and CIFAR100, and a real-life use case using diatom microscopy images show that our method achieves superior performance compared to the existing methods on image retrieval.

Related papers

Learning Visual Hierarchies with Hyperbolic Embeddings [28.35250955426006]
We introduce a learning paradigm that can encode user-defined multi-level visual hierarchies in hyperbolic space without requiring explicit hierarchical labels. We show significant improvements in hierarchical retrieval tasks, demonstrating the capability of our model in capturing visual hierarchies.
arXiv Detail & Related papers (2024-11-26T14:58:06Z)
Advancing Image Retrieval with Few-Shot Learning and Relevance Feedback [5.770351255180495]
Image Retrieval with Relevance Feedback (IRRF) involves iterative human interaction during the retrieval process. We propose a new scheme based on a hyper-network, that is tailored to the task and facilitates swift adjustment to user feedback. We show that our method can attain SoTA results in few-shot one-class classification and reach comparable results in binary classification task of few-shot open-set recognition.
arXiv Detail & Related papers (2023-12-18T10:20:28Z)
Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects. In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL) A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z)
Self-supervised Multi-view Disentanglement for Expansion of Visual Collections [6.944742823561]
We consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. Our objective is to design a retrieval algorithm that effectively combines similarities computed over representations from multiple views.
arXiv Detail & Related papers (2023-02-04T22:09:17Z)
HIRL: A General Framework for Hierarchical Image Representation Learning [54.12773508883117]
We propose a general framework for Hierarchical Image Representation Learning (HIRL) This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained. Based on a probabilistic factorization, HIRL learns the most fine-grained semantics by an off-the-shelf image SSL approach and learns multiple coarse-grained semantics by a novel semantic path discrimination scheme.
arXiv Detail & Related papers (2022-05-26T05:13:26Z)
Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. We follow a retrieval-based strategy and prevent the network from learning object-specific features. Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z)
Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations [3.3754780158324564]
Cross-modality image retrieval is challenging, since images of similar (or even the same) content captured by different modalities might share few common structures. We propose a new application-independent content-based image retrieval system for reverse (sub-)image search across modalities.
arXiv Detail & Related papers (2022-01-10T19:04:28Z)
Contextual Similarity Aggregation with Self-attention for Visual Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention. We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z)
Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match. The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
Adaptive Semantic-Visual Tree for Hierarchical Embeddings [67.01307058209709]
We propose a hierarchical adaptive semantic-visual tree to depict the architecture of merchandise categories. The tree evaluates semantic similarities between different semantic levels and visual similarities within the same semantic class simultaneously. At each level, we set different margins based on the semantic hierarchy and incorporate them as prior information to learn a fine-grained feature embedding.
arXiv Detail & Related papers (2020-03-08T03:36:42Z)
CBIR using features derived by Deep Learning [0.0]
In a Content Based Image Retrieval (CBIR) System, the task is to retrieve similar images from a large database given a query image. We propose to use features derived from pre-trained network models from a deep-learning convolution network trained for a large image classification problem.
arXiv Detail & Related papers (2020-02-13T21:26:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.