Integrating Visual and Semantic Similarity Using Hierarchies for Image
Retrieval
- URL: http://arxiv.org/abs/2308.08431v1
- Date: Wed, 16 Aug 2023 15:23:14 GMT
- Title: Integrating Visual and Semantic Similarity Using Hierarchies for Image
Retrieval
- Authors: Aishwarya Venkataramanan and Martin Laviale and C\'edric Pradalier
- Abstract summary: We propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy.
The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification.
Our method achieves superior performance compared to the existing methods on image retrieval.
- Score: 0.46040036610482665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most of the research in content-based image retrieval (CBIR) focus on
developing robust feature representations that can effectively retrieve
instances from a database of images that are visually similar to a query.
However, the retrieved images sometimes contain results that are not
semantically related to the query. To address this, we propose a method for
CBIR that captures both visual and semantic similarity using a visual
hierarchy. The hierarchy is constructed by merging classes with overlapping
features in the latent space of a deep neural network trained for
classification, assuming that overlapping classes share high visual and
semantic similarities. Finally, the constructed hierarchy is integrated into
the distance calculation metric for similarity search. Experiments on standard
datasets: CUB-200-2011 and CIFAR100, and a real-life use case using diatom
microscopy images show that our method achieves superior performance compared
to the existing methods on image retrieval.
Related papers
- Advancing Image Retrieval with Few-Shot Learning and Relevance Feedback [5.770351255180495]
Image Retrieval with Relevance Feedback (IRRF) involves iterative human interaction during the retrieval process.
We propose a new scheme based on a hyper-network, that is tailored to the task and facilitates swift adjustment to user feedback.
We show that our method can attain SoTA results in few-shot one-class classification and reach comparable results in binary classification task of few-shot open-set recognition.
arXiv Detail & Related papers (2023-12-18T10:20:28Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Self-supervised Multi-view Disentanglement for Expansion of Visual
Collections [6.944742823561]
We consider the setting where a query for similar images is derived from a collection of images.
For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color.
Our objective is to design a retrieval algorithm that effectively combines similarities computed over representations from multiple views.
arXiv Detail & Related papers (2023-02-04T22:09:17Z) - HIRL: A General Framework for Hierarchical Image Representation Learning [54.12773508883117]
We propose a general framework for Hierarchical Image Representation Learning (HIRL)
This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained.
Based on a probabilistic factorization, HIRL learns the most fine-grained semantics by an off-the-shelf image SSL approach and learns multiple coarse-grained semantics by a novel semantic path discrimination scheme.
arXiv Detail & Related papers (2022-05-26T05:13:26Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image
Representations [3.3754780158324564]
Cross-modality image retrieval is challenging, since images of similar (or even the same) content captured by different modalities might share few common structures.
We propose a new application-independent content-based image retrieval system for reverse (sub-)image search across modalities.
arXiv Detail & Related papers (2022-01-10T19:04:28Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - MosAIc: Finding Artistic Connections across Culture with Conditional
Image Retrieval [27.549695661396274]
We introduce Conditional Image Retrieval (CIR) which combines visual similarity search with user supplied filters or "conditions"
CIR allows one to find pairs of similar images that span distinct subsets of the image corpus.
We show that our CIR data-structures can identify "blind spots" in Generative Adversarial Networks (GAN) where they fail to properly model the true data distribution.
arXiv Detail & Related papers (2020-07-14T16:50:29Z) - Adaptive Semantic-Visual Tree for Hierarchical Embeddings [67.01307058209709]
We propose a hierarchical adaptive semantic-visual tree to depict the architecture of merchandise categories.
The tree evaluates semantic similarities between different semantic levels and visual similarities within the same semantic class simultaneously.
At each level, we set different margins based on the semantic hierarchy and incorporate them as prior information to learn a fine-grained feature embedding.
arXiv Detail & Related papers (2020-03-08T03:36:42Z) - CBIR using features derived by Deep Learning [0.0]
In a Content Based Image Retrieval (CBIR) System, the task is to retrieve similar images from a large database given a query image.
We propose to use features derived from pre-trained network models from a deep-learning convolution network trained for a large image classification problem.
arXiv Detail & Related papers (2020-02-13T21:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.