Leveraging EfficientNet and Contrastive Learning for Accurate
Global-scale Location Estimation
- URL: http://arxiv.org/abs/2105.07645v1
- Date: Mon, 17 May 2021 07:18:43 GMT
- Title: Leveraging EfficientNet and Contrastive Learning for Accurate
Global-scale Location Estimation
- Authors: Giorgos Kordopatis-Zilos, Panagiotis Galopoulos, Symeon Papadopoulos,
Ioannis Kompatsiaris
- Abstract summary: We propose a mixed classification-retrieval scheme for global-scale image geolocation.
Our approach demonstrates very competitive performance on four public datasets.
- Score: 15.633461635276337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the problem of global-scale image geolocation,
proposing a mixed classification-retrieval scheme. Unlike other methods that
strictly tackle the problem as a classification or retrieval task, we combine
the two practices in a unified solution leveraging the advantages of each
approach with two different modules. The first leverages the EfficientNet
architecture to assign images to a specific geographic cell in a robust way.
The second introduces a new residual architecture that is trained with
contrastive learning to map input images to an embedding space that minimizes
the pairwise geodesic distance of same-location images. For the final location
estimation, the two modules are combined with a search-within-cell scheme,
where the locations of most similar images from the predicted geographic cell
are aggregated based on a spatial clustering scheme. Our approach demonstrates
very competitive performance on four public datasets, achieving new
state-of-the-art performance in fine granularity scales, i.e., 15.0% at 1km
range on Im2GPS3k.
Related papers
- Siamese Transformer Networks for Few-shot Image Classification [9.55588609556447]
Humans exhibit remarkable proficiency in visual classification tasks, accurately recognizing and classifying new images with minimal examples.
Existing few-shot image classification methods often emphasize either global features or local features, with few studies considering the integration of both.
We propose a novel approach based on the Siamese Transformer Network (STN)
Our strategy effectively harnesses the potential of global and local features in few-shot image classification, circumventing the need for complex feature adaptation modules.
arXiv Detail & Related papers (2024-07-16T14:27:23Z) - Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation [9.161203553842787]
We present Img2Loc, a novel system that redefines image geolocalization as a text generation task.
Img2Loc first employs CLIP-based representations to generate an image-based coordinate query database.
It then uniquely combines query results with images itself, forming elaborate prompts customized for LMMs.
When tested on benchmark datasets such as Im2GPS3k and YFCC4k, Img2Loc not only surpasses the performance of previous state-of-the-art models but does so without any model training.
arXiv Detail & Related papers (2024-03-28T17:07:02Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images.
We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM)
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Interpretable Semantic Photo Geolocalization [4.286838964398275]
We present two contributions in order to improve the interpretability of a geolocalization model.
We propose a novel, semantic partitioning method which intuitively leads to an improved understanding of the predictions.
We also introduce a novel metric to assess the importance of semantic visual concepts for a certain prediction.
arXiv Detail & Related papers (2021-04-30T13:28:18Z) - Scale Aware Adaptation for Land-Cover Classification in Remote Sensing
Imagery [4.793219747021116]
Land-cover classification using remote sensing imagery is an important Earth observation task.
The benchmark datasets available for training deep segmentation models in remote sensing imagery tend to be small.
We propose a scale aware adversarial learning framework to perform joint cross-location and cross-scale land-cover classification.
arXiv Detail & Related papers (2020-12-08T05:15:43Z) - Domain Adaptive Person Re-Identification via Coupling Optimization [58.567492812339566]
Domain adaptive person Re-Identification (ReID) is challenging owing to the domain gap and shortage of annotations on target scenarios.
This paper proposes a coupling optimization method including the Domain-Invariant Mapping (DIM) method and the Global-Local distance Optimization ( GLO)
GLO is designed to train the ReID model with unsupervised setting on the target domain.
arXiv Detail & Related papers (2020-11-06T14:01:03Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.