Related papers: GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement

GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement

URL: http://arxiv.org/abs/2308.09624v1
Date: Fri, 18 Aug 2023 15:32:01 GMT
Title: GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement
Authors: Xiaohan Zhang, Xingyu Li, Waqas Sultani, Chen Chen, and Safwan Wshah
Abstract summary: Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. Existing methods still suffer from poor performance in cross-area evaluation. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details.
Score: 20.346145927174373
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. However, existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details. Our preliminary work introduced a Geometric Layout Extractor (GLE) to capture the geometric layout from input features. However, the previous GLE does not fully exploit information in the input feature. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features. To fully explore the LS techniques from our preliminary work, we further propose Contrastive Hard Samples Generation (CHSG) to facilitate model training. Extensive experiments show that GeoDTR+ achieves state-of-the-art (SOTA) results in cross-area evaluation on CVUSA, CVACT, and VIGOR by a large margin ($16.44\%$, $22.71\%$, and $17.02\%$ without polar transformation) while keeping the same-area performance comparable to existing SOTA. Moreover, we provide detailed analyses of GeoDTR+.

Related papers

RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings [7.431269929582643]
We propose a novel retrieval-augmented strategy called RANGE. We build our method on the intuition that the visual features of a location can be estimated by combining the visual features from multiple similar-looking locations. Our results show that RANGE outperforms the existing state-of-the-art models with significant margins in most tasks.
arXiv Detail & Related papers (2025-02-27T05:45:51Z)
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components. GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric. We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z)
GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer [41.26276375114911]
Point cloud completion aims to recover accurate global geometry and preserve fine-grained local details from partial point clouds. Conventional methods typically predict unseen points directly from 3D point cloud coordinates or use self-projected multi-view depth maps. We introduce a GeoFormer that simultaneously enhances the global geometric structure of the points and improves the local details.
arXiv Detail & Related papers (2024-08-13T03:15:36Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images. We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification. We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z)
GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models. We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)
Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation [2.3020018305241337]
We present a simplified but effective architecture based on contrastive learning with symmetric InfoNCE loss. Our framework consists of a narrow training pipeline that eliminates the need of using aggregation modules. Our work shows excellent performance on common cross-view datasets like CVUSA, CVACT, University-1652 and VIGOR.
arXiv Detail & Related papers (2023-03-21T13:49:49Z)
Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. Recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z)
Viewpoint Invariant Dense Matching for Visual Geolocalization [15.8038460597256]
We propose a novel method for image matching based on dense local features and tailored for visual geolocalization. Our method, called GeoWarp, directly embeds invariance to viewpoint shifts in the process of extracting dense features. GeoWarp is implemented efficiently as a re-ranking method that can be easily embedded into pre-existing visual geolocalization pipelines.
arXiv Detail & Related papers (2021-09-20T20:17:38Z)
Mix Dimension in Poincar\'{e} Geometry for 3D Skeleton-based Action Recognition [57.98278794950759]
Graph Convolutional Networks (GCNs) have already demonstrated their powerful ability to model the irregular data. We present a novel spatial-temporal GCN architecture which is defined via the Poincar'e geometry. We evaluate our method on two current largest scale 3D datasets.
arXiv Detail & Related papers (2020-07-30T18:23:18Z)
PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling [103.09504572409449]
We propose a novel deep neural network based method, called PUGeo-Net, to generate uniform dense point clouds. Thanks to its geometry-centric nature, PUGeo-Net works well for both CAD models with sharp features and scanned models with rich geometric details.
arXiv Detail & Related papers (2020-02-24T14:13:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.