CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement
- URL: http://arxiv.org/abs/2311.11604v1
- Date: Mon, 20 Nov 2023 08:40:01 GMT
- Title: CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement
- Authors: Boni Hu, Lin Chen, Runjian Chen, Shuhui Bu, Pengcheng Han, Haowei Li
- Abstract summary: Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images.
We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification.
We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
- Score: 11.108860387261508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual geolocalization is a cost-effective and scalable task that involves
matching one or more query images, taken at some unknown location, to a set of
geo-tagged reference images. Existing methods, devoted to semantic features
representation, evolving towards robustness to a wide variety between query and
reference, including illumination and viewpoint changes, as well as scale and
seasonal variations. However, practical visual geolocalization approaches need
to be robust in appearance changing and extreme viewpoint variation conditions,
while providing accurate global location estimates. Therefore, inspired by
curriculum design, human learn general knowledge first and then delve into
professional expertise. We first recognize semantic scene and then measure
geometric structure. Our approach, termed CurriculumLoc, involves a delicate
design of multi-stage refinement pipeline and a novel keypoint detection and
description with global semantic awareness and local geometric verification. We
rerank candidates and solve a particular cross-domain perspective-n-point (PnP)
problem based on these keypoints and corresponding descriptors, position
refinement occurs incrementally. The extensive experimental results on our
collected dataset, TerraTrack and a benchmark dataset, ALTO, demonstrate that
our approach results in the aforementioned desirable characteristics of a
practical visual geolocalization solution. Additionally, we achieve new high
recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances
metrics, respectively. Dataset, code and trained models are publicly available
on https://github.com/npupilab/CurriculumLoc.
Related papers
- Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization [57.34659640776723]
We propose an end-to-end framework named AddressCLIP to solve the problem with more semantics.
We have built three datasets from Pittsburgh and San Francisco on different scales specifically for the IAL problem.
arXiv Detail & Related papers (2024-07-11T03:18:53Z) - TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning [36.725822223732635]
We propose TorchSpatial, a learning framework and benchmark for location (point) encoding.
TorchSpatial contains three key components: 1) a unified location encoding framework that consolidates 15 commonly recognized location encoders; 2) the LocBench benchmark tasks encompassing 7 geo-aware image classification and 4 geo-aware image regression datasets; and 3) a comprehensive suite of evaluation metrics to quantify geo-aware models' overall performance as well as their geographic bias, with a novel Geo-Bias Score metric.
arXiv Detail & Related papers (2024-06-21T21:33:16Z) - ConGeo: Robust Cross-view Geo-localization across Ground View Variations [34.192775134189965]
Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view.
Existing learning pipelines are orientation-specific or FoV-specific, demanding separate model training for different ground view variations.
We propose ConGeo, a single- and cross-view Contrastive method for Geo-localization.
arXiv Detail & Related papers (2024-03-20T20:37:13Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - Cross-View Visual Geo-Localization for Outdoor Augmented Reality [11.214903134756888]
We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database.
We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation.
Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-28T01:58:03Z) - GeoNet: Benchmarking Unsupervised Adaptation across Geographies [71.23141626803287]
We study the problem of geographic robustness and make three main contributions.
First, we introduce a large-scale dataset GeoNet for geographic adaptation.
Second, we hypothesize that the major source of domain shifts arise from significant variations in scene context.
Third, we conduct an extensive evaluation of several state-of-the-art unsupervised domain adaptation algorithms and architectures.
arXiv Detail & Related papers (2023-03-27T17:59:34Z) - Cross-view Geo-localization via Learning Disentangled Geometric Layout
Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database.
Recent works achieve outstanding progress on cross-view geo-localization benchmarks.
However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - SIRI: Spatial Relation Induced Network For Spatial Description
Resolution [64.38872296406211]
We propose a novel relationship induced (SIRI) network for language-guided localization.
We show that our method is around 24% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius.
Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
arXiv Detail & Related papers (2020-10-27T14:04:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.