GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement
- URL: http://arxiv.org/abs/2308.09624v2
- Date: Tue, 13 Aug 2024 20:34:00 GMT
- Title: GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement
- Authors: Xiaohan Zhang, Xingyu Li, Waqas Sultani, Chen Chen, Safwan Wshah,
- Abstract summary: Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database.
Existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas.
We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details.
In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features.
- Score: 20.346145927174373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. However, existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details. Our preliminary work introduced a Geometric Layout Extractor (GLE) to capture the geometric layout from input features. However, the previous GLE does not fully exploit information in the input feature. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features. To fully explore the LS techniques from our preliminary work, we further propose Contrastive Hard Samples Generation (CHSG) to facilitate model training. Extensive experiments show that GeoDTR+ achieves state-of-the-art (SOTA) results in cross-area evaluation on CVUSA, CVACT, and VIGOR by a large margin ($16.44\%$, $22.71\%$, and $13.66\%$ without polar transformation) while keeping the same-area performance comparable to existing SOTA. Moreover, we provide detailed analyses of GeoDTR+. Our code will be available at https://gitlab.com/vail-uvm/geodtr plus.
Related papers
- Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components.
GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric.
We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z) - AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities [5.767156832161819]
We propose AnySat, a multimodal model based on joint embedding predictive architecture (JEPA) and resolution-adaptive spatial encoders.
To demonstrate the advantages of this unified approach, we compile GeoPlex, a collection of $5$ multimodal datasets.
We then train a single powerful model on these diverse datasets simultaneously.
arXiv Detail & Related papers (2024-12-18T18:11:53Z) - GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.
Our benchmark features over 10,000 manually verified instructions and covers a diverse set of variations in visual conditions, object type, and scale.
We evaluate several state-of-the-art VLMs to assess their accuracy within the geospatial context.
arXiv Detail & Related papers (2024-11-28T18:59:56Z) - GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer [41.26276375114911]
Point cloud completion aims to recover accurate global geometry and preserve fine-grained local details from partial point clouds.
Conventional methods typically predict unseen points directly from 3D point cloud coordinates or use self-projected multi-view depth maps.
We introduce a GeoFormer that simultaneously enhances the global geometric structure of the points and improves the local details.
arXiv Detail & Related papers (2024-08-13T03:15:36Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models.
We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods.
Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation [2.3020018305241337]
We present a simplified but effective architecture based on contrastive learning with symmetric InfoNCE loss.
Our framework consists of a narrow training pipeline that eliminates the need of using aggregation modules.
Our work shows excellent performance on common cross-view datasets like CVUSA, CVACT, University-1652 and VIGOR.
arXiv Detail & Related papers (2023-03-21T13:49:49Z) - Cross-view Geo-localization via Learning Disentangled Geometric Layout
Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database.
Recent works achieve outstanding progress on cross-view geo-localization benchmarks.
However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z) - Viewpoint Invariant Dense Matching for Visual Geolocalization [15.8038460597256]
We propose a novel method for image matching based on dense local features and tailored for visual geolocalization.
Our method, called GeoWarp, directly embeds invariance to viewpoint shifts in the process of extracting dense features.
GeoWarp is implemented efficiently as a re-ranking method that can be easily embedded into pre-existing visual geolocalization pipelines.
arXiv Detail & Related papers (2021-09-20T20:17:38Z) - Mix Dimension in Poincar\'{e} Geometry for 3D Skeleton-based Action
Recognition [57.98278794950759]
Graph Convolutional Networks (GCNs) have already demonstrated their powerful ability to model the irregular data.
We present a novel spatial-temporal GCN architecture which is defined via the Poincar'e geometry.
We evaluate our method on two current largest scale 3D datasets.
arXiv Detail & Related papers (2020-07-30T18:23:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.