Global Features are All You Need for Image Retrieval and Reranking
- URL: http://arxiv.org/abs/2308.06954v2
- Date: Sat, 19 Aug 2023 06:15:43 GMT
- Title: Global Features are All You Need for Image Retrieval and Reranking
- Authors: Shihao Shao, Kaifeng Chen, Arjun Karpur, Qinghua Cui, Andre Araujo,
and Bingyi Cao
- Abstract summary: SuperGlobal is a novel approach that exclusively employs global features for both stages, improving efficiency without sacrificing accuracy.
Our experiments demonstrate substantial improvements compared to the state of the art in standard benchmarks.
Our two-stage system surpasses the current single-stage state-of-the-art by 16.3%, offering a scalable, accurate alternative for high-performing image retrieval systems.
- Score: 2.6198864241281434
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image retrieval systems conventionally use a two-stage paradigm, leveraging
global features for initial retrieval and local features for reranking.
However, the scalability of this method is often limited due to the significant
storage and computation cost incurred by local feature matching in the
reranking stage. In this paper, we present SuperGlobal, a novel approach that
exclusively employs global features for both stages, improving efficiency
without sacrificing accuracy. SuperGlobal introduces key enhancements to the
retrieval system, specifically focusing on the global feature extraction and
reranking processes. For extraction, we identify sub-optimal performance when
the widely-used ArcFace loss and Generalized Mean (GeM) pooling methods are
combined and propose several new modules to improve GeM pooling. In the
reranking stage, we introduce a novel method to update the global features of
the query and top-ranked images by only considering feature refinement with a
small set of images, thus being very compute and memory efficient. Our
experiments demonstrate substantial improvements compared to the state of the
art in standard benchmarks. Notably, on the Revisited Oxford+1M Hard dataset,
our single-stage results improve by 7.1%, while our two-stage gain reaches 3.7%
with a strong 64,865x speedup. Our two-stage system surpasses the current
single-stage state-of-the-art by 16.3%, offering a scalable, accurate
alternative for high-performing image retrieval systems with minimal time
overhead. Code: https://github.com/ShihaoShao-GH/SuperGlobal.
Related papers
- Global Structure-from-Motion Revisited [57.30100303979393]
We propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM.
In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM.
We share our system as an open-source implementation.
arXiv Detail & Related papers (2024-07-29T17:54:24Z) - Any Image Restoration with Efficient Automatic Degradation Adaptation [132.81912195537433]
We propose a unified manner to achieve joint embedding by leveraging the inherent similarities across various degradations for efficient and comprehensive restoration.
Our network sets new SOTA records while reducing model complexity by approximately -82% in trainable parameters and -85% in FLOPs.
arXiv Detail & Related papers (2024-07-18T10:26:53Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - Coarse-to-Fine: Learning Compact Discriminative Representation for
Single-Stage Image Retrieval [11.696941841000985]
Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications.
We propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation.
Our method achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris.
arXiv Detail & Related papers (2023-08-08T03:06:10Z) - Recursive Generalization Transformer for Image Super-Resolution [108.67898547357127]
We propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images.
We combine the RG-SA with local self-attention to enhance the exploitation of the global context.
Our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively.
arXiv Detail & Related papers (2023-03-11T10:44:44Z) - Cross-modal Local Shortest Path and Global Enhancement for
Visible-Thermal Person Re-Identification [2.294635424666456]
We propose the Cross-modal Local Shortest Path and Global Enhancement (CM-LSP-GE) modules,a two-stream network based on joint learning of local and global features.
The experimental results on two typical datasets show that our model is obviously superior to the most state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T10:27:22Z) - Revisiting Global Statistics Aggregation for Improving Image Restoration [8.803962179239385]
Test-time Local Statistics Converter (TLSC) significantly improves image restorer's performance.
By extending SE with TLSC to the state-of-the-art models, MPRNet boost by 0.65 dB in PSNR on GoPro dataset, achieves 33.31 dB, exceeds the previous best result 0.6 dB.
arXiv Detail & Related papers (2021-12-08T12:52:14Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z) - CooGAN: A Memory-Efficient Framework for High-Resolution Facial
Attribute Editing [84.92009553462384]
We propose a NOVEL pixel translation framework called Cooperative GAN(CooGAN) for HR facial image editing.
This framework features a local path for fine-grained local facial patch generation (i.e., patch-level HR, LOW memory) and a global path for global lowresolution (LR) facial structure monitoring (i.e., image-level LR, LOW memory)
In addition, we propose a lighter selective transfer unit for more efficient multi-scale features fusion, yielding higher fidelity facial attributes manipulation.
arXiv Detail & Related papers (2020-11-03T08:40:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.