Content-based Image Retrieval and the Semantic Gap in the Deep Learning
Era
- URL: http://arxiv.org/abs/2011.06490v1
- Date: Thu, 12 Nov 2020 17:00:08 GMT
- Title: Content-based Image Retrieval and the Semantic Gap in the Deep Learning
Era
- Authors: Bj\"orn Barz, Joachim Denzler
- Abstract summary: Content-based image retrieval has seen astonishing progress over the past decade, especially for the task of retrieving images of the same object.
This brings rise to the question: Do the recent advances in instance retrieval transfer to more generic image retrieval scenarios?
We first provide a brief overview of the most relevant milestones of instance retrieval. We then apply them to a semantic image retrieval task and find that they perform inferior to much less sophisticated and more generic methods.
We conclude that the key problem for the further advancement of semantic image retrieval lies in the lack of a standardized task definition and an appropriate benchmark dataset.
- Score: 9.59805804476193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Content-based image retrieval has seen astonishing progress over the past
decade, especially for the task of retrieving images of the same object that is
depicted in the query image. This scenario is called instance or object
retrieval and requires matching fine-grained visual patterns between images.
Semantics, however, do not play a crucial role. This brings rise to the
question: Do the recent advances in instance retrieval transfer to more generic
image retrieval scenarios? To answer this question, we first provide a brief
overview of the most relevant milestones of instance retrieval. We then apply
them to a semantic image retrieval task and find that they perform inferior to
much less sophisticated and more generic methods in a setting that requires
image understanding. Following this, we review existing approaches to closing
this so-called semantic gap by integrating prior world knowledge. We conclude
that the key problem for the further advancement of semantic image retrieval
lies in the lack of a standardized task definition and an appropriate benchmark
dataset.
Related papers
- Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - Integrating Visual and Semantic Similarity Using Hierarchies for Image
Retrieval [0.46040036610482665]
We propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy.
The hierarchy is constructed by merging classes with overlapping features in the latent space of a deep neural network trained for classification.
Our method achieves superior performance compared to the existing methods on image retrieval.
arXiv Detail & Related papers (2023-08-16T15:23:14Z) - Vocabulary-free Image Classification [75.38039557783414]
We formalize a novel task, termed as Vocabulary-free Image Classification (VIC)
VIC aims to assign to an input image a class that resides in an unconstrained language-induced semantic space, without the prerequisite of a known vocabulary.
CaSED is a method that exploits a pre-trained vision-language model and an external vision-language database to address VIC in a training-free manner.
arXiv Detail & Related papers (2023-06-01T17:19:43Z) - Image-text Retrieval via Preserving Main Semantics of Vision [5.376441473801597]
This paper presents a semantic optimization approach, implemented as a Visual Semantic Loss (VSL)
We leverage the annotated texts corresponding to an image to assist the model in capturing the main content of the image.
Experiments on two benchmark datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2023-04-20T12:23:29Z) - Bridging the Gap between Local Semantic Concepts and Bag of Visual Words
for Natural Scene Image Retrieval [0.0]
A typical content-based image retrieval system deals with the query image and images in the dataset as a collection of low-level features.
Top ranked images in the retrieved list, which have high similarities to the query image, may be different from the query image in terms of the semantic interpretation of the user.
This paper investigates how natural scene retrieval can be performed using the bag of visual word model and the distribution of local semantic concepts.
arXiv Detail & Related papers (2022-10-17T09:10:50Z) - BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
Counterfactual Training for Robust Content-based Image Retrieval [61.803481264081036]
Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text.
We tackle this task by a novel underlinetextbfBottom-up crunderlinetextbfOss-modal underlinetextbfSemantic compounderlinetextbfSition (textbfBOSS) with Hybrid Counterfactual Training framework.
arXiv Detail & Related papers (2022-07-09T07:14:44Z) - Progressive Learning for Image Retrieval with Hybrid-Modality Queries [48.79599320198615]
Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR)
We decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries.
Our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.
arXiv Detail & Related papers (2022-04-24T08:10:06Z) - ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and
Implicit Similarity [16.550790981646276]
Current approaches combine the features of each of the two elements of the query into a single representation.
Our work aims at shedding new light on the task by looking at it through the prism of two familiar and related frameworks: text-to-image and image-to-image retrieval.
arXiv Detail & Related papers (2022-03-15T17:29:20Z) - Context-Aware Image Inpainting with Learned Semantic Priors [100.99543516733341]
We introduce pretext tasks that are semantically meaningful to estimating the missing contents.
We propose a context-aware image inpainting model, which adaptively integrates global semantics and local features.
arXiv Detail & Related papers (2021-06-14T08:09:43Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.