Bridging the Gap between Local Semantic Concepts and Bag of Visual Words
for Natural Scene Image Retrieval
- URL: http://arxiv.org/abs/2210.08875v1
- Date: Mon, 17 Oct 2022 09:10:50 GMT
- Title: Bridging the Gap between Local Semantic Concepts and Bag of Visual Words
for Natural Scene Image Retrieval
- Authors: Yousef Alqasrawi
- Abstract summary: A typical content-based image retrieval system deals with the query image and images in the dataset as a collection of low-level features.
Top ranked images in the retrieved list, which have high similarities to the query image, may be different from the query image in terms of the semantic interpretation of the user.
This paper investigates how natural scene retrieval can be performed using the bag of visual word model and the distribution of local semantic concepts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the problem of semantic-based image retrieval of natural
scenes. A typical content-based image retrieval system deals with the query
image and images in the dataset as a collection of low-level features and
retrieves a ranked list of images based on the similarities between features of
the query image and features of images in the image dataset. However, top
ranked images in the retrieved list, which have high similarities to the query
image, may be different from the query image in terms of the semantic
interpretation of the user which is known as the semantic gap. In order to
reduce the semantic gap, this paper investigates how natural scene retrieval
can be performed using the bag of visual word model and the distribution of
local semantic concepts. The paper studies the efficiency of using different
approaches for representing the semantic information, depicted in natural scene
images, for image retrieval. An extensive experimental work has been conducted
to study the efficiency of using semantic information as well as the bag of
visual words model for natural and urban scene image retrieval.
Related papers
- Vocabulary-free Image Classification and Semantic Segmentation [71.78089106671581]
We introduce the Vocabulary-free Image Classification (VIC) task, which aims to assign a class from an un-constrained language-induced semantic space to an input image without needing a known vocabulary.
VIC is challenging due to the vastness of the semantic space, which contains millions of concepts, including fine-grained categories.
We propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database.
arXiv Detail & Related papers (2024-04-16T19:27:21Z) - Vocabulary-free Image Classification [75.38039557783414]
We formalize a novel task, termed as Vocabulary-free Image Classification (VIC)
VIC aims to assign to an input image a class that resides in an unconstrained language-induced semantic space, without the prerequisite of a known vocabulary.
CaSED is a method that exploits a pre-trained vision-language model and an external vision-language database to address VIC in a training-free manner.
arXiv Detail & Related papers (2023-06-01T17:19:43Z) - Revising Image-Text Retrieval via Multi-Modal Entailment [25.988058843564335]
Many-to-many matching phenomenon is quite common in the widely-used image-text retrieval datasets.
We propose a multi-modal entailment classifier to determine whether a sentence is entailed by an image plus its linked captions.
arXiv Detail & Related papers (2022-08-22T07:58:54Z) - Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image.
We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification.
Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z) - Scene Graph Embeddings Using Relative Similarity Supervision [4.137464623395376]
We employ a graph convolutional network to exploit structure in scene graphs and produce image embeddings useful for semantic image retrieval.
We propose a novel loss function that operates on pairs of similar and dissimilar images and imposes relative ordering between them in embedding space.
We demonstrate that this Ranking loss, coupled with an intuitive triple sampling strategy, leads to robust representations that outperform well-known contrastive losses on the retrieval task.
arXiv Detail & Related papers (2021-04-06T09:13:05Z) - Telling the What while Pointing the Where: Fine-grained Mouse Trace and
Language Supervision for Improved Image Retrieval [60.24860627782486]
Fine-grained image retrieval often requires the ability to also express the where in the image the content they are looking for is.
In this paper, we describe an image retrieval setup where the user simultaneously describes an image using both spoken natural language (the "what") and mouse traces over an empty canvas (the "where")
Our model is capable of taking this spatial guidance into account, and provides more accurate retrieval results compared to text-only equivalent systems.
arXiv Detail & Related papers (2021-02-09T17:54:34Z) - Using Text to Teach Image Retrieval [47.72498265721957]
We build on the concept of image manifold to represent the feature space of images, learned via neural networks, as a graph.
We augment the manifold samples with geometrically aligned text, thereby using a plethora of sentences to teach us about images.
The experimental results show that the joint embedding manifold is a robust representation, allowing it to be a better basis to perform image retrieval.
arXiv Detail & Related papers (2020-11-19T16:09:14Z) - Cross-domain Correspondence Learning for Exemplar-based Image
Translation [59.35767271091425]
We present a framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain.
The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar.
We show that our method is superior to state-of-the-art methods in terms of image quality significantly.
arXiv Detail & Related papers (2020-04-12T09:10:57Z) - Fine-grained Image Classification and Retrieval by Combining Visual and
Locally Pooled Textual Features [8.317191999275536]
In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks.
In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities.
arXiv Detail & Related papers (2020-01-14T12:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.