Unifying Deep Local and Global Features for Image Search
- URL: http://arxiv.org/abs/2001.05027v4
- Date: Tue, 15 Sep 2020 18:21:56 GMT
- Title: Unifying Deep Local and Global Features for Image Search
- Authors: Bingyi Cao, Andre Araujo, Jack Sim
- Abstract summary: We unify global and local image features into a single deep model, enabling accurate retrieval with efficient feature extraction.
Our model achieves state-of-the-art image retrieval on the Revisited Oxford and Paris datasets, and state-of-the-art single-model instance-level recognition on the Google Landmarks dataset v2.
- Score: 9.614694312155798
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image retrieval is the problem of searching an image database for items that
are similar to a query image. To address this task, two main types of image
representations have been studied: global and local image features. In this
work, our key contribution is to unify global and local features into a single
deep model, enabling accurate retrieval with efficient feature extraction. We
refer to the new model as DELG, standing for DEep Local and Global features. We
leverage lessons from recent feature learning work and propose a model that
combines generalized mean pooling for global features and attentive selection
for local features. The entire network can be learned end-to-end by carefully
balancing the gradient flow between two heads -- requiring only image-level
labels. We also introduce an autoencoder-based dimensionality reduction
technique for local features, which is integrated into the model, improving
training efficiency and matching performance. Comprehensive experiments show
that our model achieves state-of-the-art image retrieval on the Revisited
Oxford and Paris datasets, and state-of-the-art single-model instance-level
recognition on the Google Landmarks dataset v2. Code and models are available
at https://github.com/tensorflow/models/tree/master/research/delf .
Related papers
- Siamese Transformer Networks for Few-shot Image Classification [9.55588609556447]
Humans exhibit remarkable proficiency in visual classification tasks, accurately recognizing and classifying new images with minimal examples.
Existing few-shot image classification methods often emphasize either global features or local features, with few studies considering the integration of both.
We propose a novel approach based on the Siamese Transformer Network (STN)
Our strategy effectively harnesses the potential of global and local features in few-shot image classification, circumventing the need for complex feature adaptation modules.
arXiv Detail & Related papers (2024-07-16T14:27:23Z) - Composing Object Relations and Attributes for Image-Text Matching [70.47747937665987]
This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges.
Our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system.
arXiv Detail & Related papers (2024-06-17T17:56:01Z) - Coarse-to-Fine: Learning Compact Discriminative Representation for
Single-Stage Image Retrieval [11.696941841000985]
Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications.
We propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation.
Our method achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris.
arXiv Detail & Related papers (2023-08-08T03:06:10Z) - Efficient and Explicit Modelling of Image Hierarchies for Image
Restoration [120.35246456398738]
We propose a mechanism to efficiently and explicitly model image hierarchies in the global, regional, and local range for image restoration.
Inspired by that, we propose the anchored stripe self-attention which achieves a good balance between the space and time complexity of self-attention.
Then we propose a new network architecture dubbed GRL to explicitly model image hierarchies in the Global, Regional, and Local range.
arXiv Detail & Related papers (2023-03-01T18:59:29Z) - Deep Learning Model with GA based Feature Selection and Context
Integration [2.3472688456025756]
We propose a novel three-layered deep learning model that assiminlates or learns independently global and local contextual information alongside visual features.
The novelty of the proposed model is that One-vs-All binary class-based learners are introduced to learn Genetic Algorithm (GA) optimized features in the visual layer.
optimized visual features with global and local contextual information play a significant role to improve accuracy and produce stable predictions comparable to state-of-the-art deep CNN models.
arXiv Detail & Related papers (2022-04-13T06:28:41Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - Learning Super-Features for Image Retrieval [34.22539650643026]
We propose a novel architecture for deep image retrieval based solely on mid-level features that we call Super-features.
Experiments on common landmark retrieval benchmarks validate that Super-features substantially outperform state-of-the-art methods when using the same number of features.
arXiv Detail & Related papers (2022-01-31T12:48:42Z) - DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local
and Global Features [42.62089148690047]
We propose a Deep Orthogonal Local and Global (DOLG) information fusion framework for end-to-end image retrieval.
It attentively extracts representative local information with multi-atrous convolutions and self-attention at first.
The whole framework is end-to-end differentiable and can be trained with image-level labels.
arXiv Detail & Related papers (2021-08-06T03:14:09Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.