Coarse-to-Fine: Learning Compact Discriminative Representation for
Single-Stage Image Retrieval
- URL: http://arxiv.org/abs/2308.04008v1
- Date: Tue, 8 Aug 2023 03:06:10 GMT
- Title: Coarse-to-Fine: Learning Compact Discriminative Representation for
Single-Stage Image Retrieval
- Authors: Yunquan Zhu, Xinkai Gao, Bo Ke, Ruizhi Qiao, Xing Sun
- Abstract summary: Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications.
We propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation.
Our method achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris.
- Score: 11.696941841000985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image retrieval targets to find images from a database that are visually
similar to the query image. Two-stage methods following retrieve-and-rerank
paradigm have achieved excellent performance, but their separate local and
global modules are inefficient to real-world applications. To better trade-off
retrieval efficiency and accuracy, some approaches fuse global and local
feature into a joint representation to perform single-stage image retrieval.
However, they are still challenging due to various situations to tackle,
$e.g.$, background, occlusion and viewpoint. In this work, we design a
Coarse-to-Fine framework to learn Compact Discriminative representation (CFCD)
for end-to-end single-stage image retrieval-requiring only image-level labels.
Specifically, we first design a novel adaptive softmax-based loss which
dynamically tunes its scale and margin within each mini-batch and increases
them progressively to strengthen supervision during training and intra-class
compactness. Furthermore, we propose a mechanism which attentively selects
prominent local descriptors and infuse fine-grained semantic relations into the
global representation by a hard negative sampling strategy to optimize
inter-class distinctiveness at a global scale. Extensive experimental results
have demonstrated the effectiveness of our method, which achieves
state-of-the-art single-stage image retrieval performance on benchmarks such as
Revisited Oxford and Revisited Paris. Code is available at
https://github.com/bassyess/CFCD.
Related papers
- Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models [44.437693135170576]
We propose a new framework, LMM with Sophisticated Tasks, Local image compression, and Mixture of global Experts (SliME)
We extract contextual information from the global view using a mixture of adapters, based on the observation that different adapters excel at different tasks.
The proposed method achieves leading performance across various benchmarks with only 2 million training data.
arXiv Detail & Related papers (2024-06-12T17:59:49Z) - Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval [92.13664084464514]
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent.
Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments.
We propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
arXiv Detail & Related papers (2024-03-03T07:58:03Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Towards Effective Image Manipulation Detection with Proposal Contrastive
Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection.
Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively.
Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z) - High-Quality Pluralistic Image Completion via Code Shared VQGAN [51.7805154545948]
We present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.
Our framework is able to learn semantically-rich discrete codes efficiently and robustly, resulting in much better image reconstruction quality.
arXiv Detail & Related papers (2022-04-05T01:47:35Z) - ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based
Image Retrieval [28.022137537238425]
We propose an textbfApproaching-and-textbfCentralizing textbfNetwork (termed textbfACNet'') to jointly optimize sketch-to-photo synthesis and the image retrieval.
The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain.
Our approach achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.
arXiv Detail & Related papers (2021-11-24T19:36:10Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local
and Global Features [42.62089148690047]
We propose a Deep Orthogonal Local and Global (DOLG) information fusion framework for end-to-end image retrieval.
It attentively extracts representative local information with multi-atrous convolutions and self-attention at first.
The whole framework is end-to-end differentiable and can be trained with image-level labels.
arXiv Detail & Related papers (2021-08-06T03:14:09Z) - Reconciliation of Statistical and Spatial Sparsity For Robust Image and
Image-Set Classification [27.319334479994787]
We propose a novel Joint Statistical and Spatial Sparse representation, dubbed textitJ3S, to model the image or image-set data for classification.
We propose to solve the joint sparse coding problem based on the J3S model, by coupling the local and global image representations using joint sparsity.
Experiments show that the proposed J3S-based image classification scheme outperforms the popular or state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.
arXiv Detail & Related papers (2021-06-01T06:33:24Z) - Weakly-supervised Object Localization for Few-shot Learning and
Fine-grained Few-shot Learning [0.5156484100374058]
Few-shot learning aims to learn novel visual categories from very few samples.
We propose a Self-Attention Based Complementary Module (SAC Module) to fulfill the weakly-supervised object localization.
We also produce the activated masks for selecting discriminative deep descriptors for few-shot classification.
arXiv Detail & Related papers (2020-03-02T14:07:05Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.