Instance-level Image Retrieval using Reranking Transformers
- URL: http://arxiv.org/abs/2103.12236v1
- Date: Mon, 22 Mar 2021 23:58:38 GMT
- Title: Instance-level Image Retrieval using Reranking Transformers
- Authors: Fuwen Tan, Jiangbo Yuan, Vicente Ordonez
- Abstract summary: Instance-level image retrieval is the task of searching in a large database for images that match an object in a query image.
We propose Reranking Transformers (RRTs) as a general model to incorporate both local and global features to rerank the matching images.
RRTs are lightweight and can be easily parallelized so that reranking a set of top matching results can be performed in a single forward-pass.
- Score: 18.304597755595697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instance-level image retrieval is the task of searching in a large database
for images that match an object in a query image. To address this task, systems
usually rely on a retrieval step that uses global image descriptors, and a
subsequent step that performs domain-specific refinements or reranking by
leveraging operations such as geometric verification based on local features.
In this work, we propose Reranking Transformers (RRTs) as a general model to
incorporate both local and global features to rerank the matching images in a
supervised fashion and thus replace the relatively expensive process of
geometric verification. RRTs are lightweight and can be easily parallelized so
that reranking a set of top matching results can be performed in a single
forward-pass. We perform extensive experiments on the Revisited Oxford and
Paris datasets, and the Google Landmark v2 dataset, showing that RRTs
outperform previous reranking approaches while using much fewer local
descriptors. Moreover, we demonstrate that, unlike existing approaches, RRTs
can be optimized jointly with the feature extractor, which can lead to feature
representations tailored to downstream tasks and further accuracy improvements.
Training code and pretrained models will be made public.
Related papers
- Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval [92.13664084464514]
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent.
Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments.
We propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
arXiv Detail & Related papers (2024-03-03T07:58:03Z) - Graph Convolution Based Efficient Re-Ranking for Visual Retrieval [29.804582207550478]
We present an efficient re-ranking method which refines initial retrieval results by updating features.
Specifically, we reformulate re-ranking based on Graph Convolution Networks (GCN) and propose a novel Graph Convolution based Re-ranking (GCR) for visual retrieval tasks via feature propagation.
In particular, the plain GCR is extended for cross-camera retrieval and an improved feature propagation formulation is presented to leverage affinity relationships across different cameras.
arXiv Detail & Related papers (2023-06-15T00:28:08Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place
Recognition [92.56937383283397]
We propose a unified place recognition framework that handles both retrieval and reranking.
The proposed reranking module takes feature correlation, attention value, and xy coordinates into account.
$R2$Former significantly outperforms state-of-the-art methods on major VPR datasets.
arXiv Detail & Related papers (2023-04-06T23:19:32Z) - Recursive Generalization Transformer for Image Super-Resolution [108.67898547357127]
We propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images.
We combine the RG-SA with local self-attention to enhance the exploitation of the global context.
Our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively.
arXiv Detail & Related papers (2023-03-11T10:44:44Z) - SImProv: Scalable Image Provenance Framework for Robust Content
Attribution [80.25476792081403]
We present SImProv, a framework to match a query image back to a trusted database of originals.
SimProv consists of three stages: a scalable search stage for retrieving top-k most similar images; a re-ranking and near-duplicated detection stage for identifying the original among the candidates.
We demonstrate effective retrieval and manipulation detection over a dataset of 100 million images.
arXiv Detail & Related papers (2022-06-28T18:42:36Z) - Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and
Local Information [15.32353270625554]
Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images.
We first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels.
Experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task.
arXiv Detail & Related papers (2022-04-21T03:18:09Z) - Reuse your features: unifying retrieval and feature-metric alignment [3.845387441054033]
DRAN is the first network able to produce the features for the three steps of visual localization.
It achieves competitive performance in terms of robustness and accuracy under challenging conditions in public benchmarks.
arXiv Detail & Related papers (2022-04-13T10:42:00Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - LoFTR: Detector-Free Local Feature Matching with Transformers [40.754990768677295]
Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level.
In contrast to dense methods that use a cost volume to search correspondences, we use self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images.
The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-04-01T17:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.