GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval
- URL: http://arxiv.org/abs/2111.13122v1
- Date: Thu, 25 Nov 2021 15:19:21 GMT
- Title: GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval
- Authors: Konstantin Schall, Kai Uwe Barthel, Nico Hezel, Klaus Jung
- Abstract summary: We show that large-scale pretraining significantly improves retrieval performance and present experiments on how to further increase these properties by appropriate fine-tuning.
With these promising results, we hope to increase interest in the research topic of general-purpose CBIR.
- Score: 2.421459418045937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even though it has extensively been shown that retrieval specific training of
deep neural networks is beneficial for nearest neighbor image search quality,
most of these models are trained and tested in the domain of landmarks images.
However, some applications use images from various other domains and therefore
need a network with good generalization properties - a general-purpose CBIR
model. To the best of our knowledge, no testing protocol has so far been
introduced to benchmark models with respect to general image retrieval quality.
After analyzing popular image retrieval test sets we decided to manually curate
GPR1200, an easy to use and accessible but challenging benchmark dataset with a
broad range of image categories. This benchmark is subsequently used to
evaluate various pretrained models of different architectures on their
generalization qualities. We show that large-scale pretraining significantly
improves retrieval performance and present experiments on how to further
increase these properties by appropriate fine-tuning. With these promising
results, we hope to increase interest in the research topic of general-purpose
CBIR.
Related papers
- Few-Shot Anomaly Detection via Category-Agnostic Registration Learning [65.64252994254268]
Most existing anomaly detection methods require a dedicated model for each category.
This article proposes a novel few-shot AD (FSAD) framework.
It is the first FSAD method that requires no model fine-tuning for novel categories.
arXiv Detail & Related papers (2024-06-13T05:01:13Z) - Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - Rethinking Benchmarks for Cross-modal Image-text Retrieval [44.31783230767321]
Cross-modal semantic understanding and matching is a major challenge in image-text retrieval.
In this paper, we review the two common benchmarks and observe that they are insufficient to assess the true capability of models on fine-grained cross-modal semantic matching.
We propose a novel semi-automatic renovation approach to refine coarse-grained sentences into finer-grained ones with little human effort.
The results show that even the state-of-the-art models have much room for improvement in fine-grained semantic understanding.
arXiv Detail & Related papers (2023-04-21T09:07:57Z) - A ResNet is All You Need? Modeling A Strong Baseline for Detecting
Referable Diabetic Retinopathy in Fundus Images [0.0]
We model a strong baseline for this task based on a simple and standard ResNet-18 architecture.
Our model achieved an AUC = 0.955 on a combined test set of 61007 test images from different public datasets.
arXiv Detail & Related papers (2022-10-06T19:40:56Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Cross-Modal Retrieval Augmentation for Multi-Modal Classification [61.5253261560224]
We explore the use of unstructured external knowledge sources of images and their corresponding captions for improving visual question answering.
First, we train a novel alignment model for embedding images and captions in the same space, which achieves substantial improvement on image-caption retrieval.
Second, we show that retrieval-augmented multi-modal transformers using the trained alignment model improve results on VQA over strong baselines.
arXiv Detail & Related papers (2021-04-16T13:27:45Z) - A Decade Survey of Content Based Image Retrieval using Deep Learning [13.778851745408133]
This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval.
The similarity between the representative features of the query image and dataset images is used to rank the images for retrieval.
Deep learning has emerged as a dominating alternative of hand-designed feature engineering from a decade.
arXiv Detail & Related papers (2020-11-23T02:12:30Z) - On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews,
Guidances and Million-AID [57.71601467271486]
This article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation.
We first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations.
Following the presented guidances, we also provide an example on building RS image dataset, i.e., Million-AID, a new large-scale benchmark dataset.
arXiv Detail & Related papers (2020-06-22T17:59:00Z) - CBIR using features derived by Deep Learning [0.0]
In a Content Based Image Retrieval (CBIR) System, the task is to retrieve similar images from a large database given a query image.
We propose to use features derived from pre-trained network models from a deep-learning convolution network trained for a large image classification problem.
arXiv Detail & Related papers (2020-02-13T21:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.