Cross-view Geo-localization via Learning Disentangled Geometric Layout
Correspondence
- URL: http://arxiv.org/abs/2212.04074v3
- Date: Fri, 16 Jun 2023 17:57:45 GMT
- Title: Cross-view Geo-localization via Learning Disentangled Geometric Layout
Correspondence
- Authors: Xiaohan Zhang, Xingyu Li, Waqas Sultani, Yi Zhou, Safwan Wshah
- Abstract summary: Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database.
Recent works achieve outstanding progress on cross-view geo-localization benchmarks.
However, existing methods still suffer from poor performance on the cross-area benchmarks.
- Score: 11.823147814005411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-view geo-localization aims to estimate the location of a query ground
image by matching it to a reference geo-tagged aerial images database. As an
extremely challenging task, its difficulties root in the drastic view changes
and different capturing time between two views. Despite these difficulties,
recent works achieve outstanding progress on cross-view geo-localization
benchmarks. However, existing methods still suffer from poor performance on the
cross-area benchmarks, in which the training and testing data are captured from
two different regions. We attribute this deficiency to the lack of ability to
extract the spatial configuration of visual feature layouts and models'
overfitting on low-level details from the training set. In this paper, we
propose GeoDTR which explicitly disentangles geometric information from raw
features and learns the spatial correlations among visual features from aerial
and ground pairs with a novel geometric layout extractor module. This module
generates a set of geometric layout descriptors, modulating the raw features
and producing high-quality latent representations. In addition, we elaborate on
two categories of data augmentations, (i) Layout simulation, which varies the
spatial configuration while keeping the low-level details intact. (ii) Semantic
augmentation, which alters the low-level details and encourages the model to
capture spatial configurations. These augmentations help to improve the
performance of the cross-view geo-localization models, especially on the
cross-area benchmarks. Moreover, we propose a counterfactual-based learning
process to benefit the geometric layout extractor in exploring spatial
information. Extensive experiments show that GeoDTR not only achieves
state-of-the-art results but also significantly boosts the performance on
same-area and cross-area benchmarks.
Related papers
- Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - ConGeo: Robust Cross-view Geo-localization across Ground View Variations [34.192775134189965]
Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view.
Existing learning pipelines are orientation-specific or FoV-specific, demanding separate model training for different ground view variations.
We propose ConGeo, a single- and cross-view Contrastive method for Geo-localization.
arXiv Detail & Related papers (2024-03-20T20:37:13Z) - CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images.
We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification.
We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z) - GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement [20.346145927174373]
Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database.
Existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas.
We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details.
In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features.
arXiv Detail & Related papers (2023-08-18T15:32:01Z) - Cross-View Visual Geo-Localization for Outdoor Augmented Reality [11.214903134756888]
We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database.
We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation.
Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-28T01:58:03Z) - Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation [2.3020018305241337]
We present a simplified but effective architecture based on contrastive learning with symmetric InfoNCE loss.
Our framework consists of a narrow training pipeline that eliminates the need of using aggregation modules.
Our work shows excellent performance on common cross-view datasets like CVUSA, CVACT, University-1652 and VIGOR.
arXiv Detail & Related papers (2023-03-21T13:49:49Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.