Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation
- URL: http://arxiv.org/abs/2303.11851v2
- Date: Tue, 29 Aug 2023 07:57:20 GMT
- Title: Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation
- Authors: Fabian Deuser, Konrad Habel, Norbert Oswald
- Abstract summary: We present a simplified but effective architecture based on contrastive learning with symmetric InfoNCE loss.
Our framework consists of a narrow training pipeline that eliminates the need of using aggregation modules.
Our work shows excellent performance on common cross-view datasets like CVUSA, CVACT, University-1652 and VIGOR.
- Score: 2.3020018305241337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-View Geo-Localisation is still a challenging task where additional
modules, specific pre-processing or zooming strategies are necessary to
determine accurate positions of images. Since different views have different
geometries, pre-processing like polar transformation helps to merge them.
However, this results in distorted images which then have to be rectified.
Adding hard negatives to the training batch could improve the overall
performance but with the default loss functions in geo-localisation it is
difficult to include them. In this article, we present a simplified but
effective architecture based on contrastive learning with symmetric InfoNCE
loss that outperforms current state-of-the-art results. Our framework consists
of a narrow training pipeline that eliminates the need of using aggregation
modules, avoids further pre-processing steps and even increases the
generalisation capability of the model to unknown regions. We introduce two
types of sampling strategies for hard negatives. The first explicitly exploits
geographically neighboring locations to provide a good starting point. The
second leverages the visual similarity between the image embeddings in order to
mine hard negative samples. Our work shows excellent performance on common
cross-view datasets like CVUSA, CVACT, University-1652 and VIGOR. A comparison
between cross-area and same-area settings demonstrate the good generalisation
capability of our model.
Related papers
- GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement [20.346145927174373]
Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database.
Existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas.
We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details.
In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features.
arXiv Detail & Related papers (2023-08-18T15:32:01Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Simple, Effective and General: A New Backbone for Cross-view Image
Geo-localization [9.687328460113832]
We propose a new backbone network, named Simple Attention-based Image Geo-localization network (SAIG)
The proposed SAIG effectively represents long-range interactions among patches as well as cross-view correspondence with multi-head self-attention layers.
Our SAIG achieves state-of-the-art results on cross-view geo-localization, while being far simpler than previous works.
arXiv Detail & Related papers (2023-02-03T06:50:51Z) - Cross-view Geo-localization via Learning Disentangled Geometric Layout
Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database.
Recent works achieve outstanding progress on cross-view geo-localization benchmarks.
However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z) - Towards Effective Image Manipulation Detection with Proposal Contrastive
Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection.
Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively.
Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z) - Viewpoint Invariant Dense Matching for Visual Geolocalization [15.8038460597256]
We propose a novel method for image matching based on dense local features and tailored for visual geolocalization.
Our method, called GeoWarp, directly embeds invariance to viewpoint shifts in the process of extracting dense features.
GeoWarp is implemented efficiently as a re-ranking method that can be easily embedded into pre-existing visual geolocalization pipelines.
arXiv Detail & Related papers (2021-09-20T20:17:38Z) - Leveraging EfficientNet and Contrastive Learning for Accurate
Global-scale Location Estimation [15.633461635276337]
We propose a mixed classification-retrieval scheme for global-scale image geolocation.
Our approach demonstrates very competitive performance on four public datasets.
arXiv Detail & Related papers (2021-05-17T07:18:43Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks.
ReSim learns both regional representations for localization as well as semantic image-level representations.
We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.