Related papers: OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata

OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata

URL: http://arxiv.org/abs/2509.18350v2
Date: Tue, 30 Sep 2025 09:45:00 GMT
Title: OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata
Authors: Oussema Dhaouadi, Riccardo Marin, Johannes Meier, Jacques Kaiser, Daniel Cremers,
Abstract summary: We propose OrthoLoC, the first large-scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities.<n>The dataset addresses domain shifts between UAV imagery and geospatial data.<n>We introduce a refinement technique called AdHoP, which can be integrated with any feature matcher, improving matching by up to 95% and reducing translation error by up to 63%.
Score: 45.03897051444244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large-area inspection, and search-and-rescue operations. In many scenarios, these systems require high-precision localization while operating with limited resources (e.g., no internet connection or GNSS/GPS support), making large image databases or heavy 3D models impractical. Surprisingly, little attention has been given to leveraging orthographic geodata as an alternative paradigm, which is lightweight and increasingly available through free releases by governmental authorities (e.g., the European Union). To fill this gap, we propose OrthoLoC, the first large-scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities. The dataset addresses domain shifts between UAV imagery and geospatial data. Its paired structure enables fair benchmarking of existing solutions by decoupling image retrieval from feature matching, allowing isolated evaluation of localization and calibration performance. Through comprehensive evaluation, we examine the impact of domain shifts, data resolutions, and covisibility on localization accuracy. Finally, we introduce a refinement technique called AdHoP, which can be integrated with any feature matcher, improving matching by up to 95% and reducing translation error by up to 63%. The dataset and code are available at: https://deepscenario.github.io/OrthoLoC.

Related papers

Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization [17.908597896653045]
This paper presents a cross-view UAV localization framework that performs map matching via object detection.<n>In typical pipelines, UAV visual localization is formulated as an image-retrieval problem.<n>Our method achieves strong retrieval and localization performance using a fine-grained, graph-based node-similarity metric.
arXiv Detail & Related papers (2025-11-04T11:25:31Z)
Scaling Image Geo-Localization to Continent Level [48.7766435870634]
This paper introduces a hybrid approach that achieves fine-grained geo-localization across a large geographic expanse the size of a continent.<n>We leverage a proxy classification task during training to learn rich feature representations that implicitly encode precise location information.<n>Our evaluation demonstrates that our approach can localize within 200m more than 68% of queries of a dataset covering a large part of Europe.
arXiv Detail & Related papers (2025-10-30T17:59:35Z)
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization [66.87005863868181]
We introduce a covisibility graph-based global encoding learning and data augmentation strategy.<n>We revisit the network architecture and local feature extraction module.<n>Our method achieves state-of-the-art on challenging large-scale datasets without relying on network ensembles or 3D supervision.
arXiv Detail & Related papers (2025-01-02T18:59:08Z)
OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance [20.043977909592115]
OSMLoc is a brain-inspired visual localization approach based on first-person-view images against the OpenStreetMap maps.<n>It integrates semantic and geometric guidance to significantly improve accuracy, robustness, and generalization capability.
arXiv Detail & Related papers (2024-11-13T14:59:00Z)
Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.<n>VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.<n>Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z)
UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization [14.87295056434887]
We introduce a large-scale 6-DoF UAV dataset for localization (UAVD4L) We develop a two-stage 6-DoF localization pipeline (UAVLoc), which consists of offline synthetic data generation and online visual localization. Results on the new dataset demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-01-11T15:19:21Z)
Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)
Visual Cross-View Metric Localization with Dense Uncertainty Estimates [11.76638109321532]
This work addresses visual cross-view metric localization for outdoor robotics. Given a ground-level color image and a satellite patch that contains the local surroundings, the task is to identify the location of the ground camera within the satellite patch. We devise a novel network architecture with denser satellite descriptors, similarity matching at the bottleneck, and a dense spatial distribution as output to capture multi-modal localization ambiguities.
arXiv Detail & Related papers (2022-08-17T20:12:23Z)
Robust Self-Tuning Data Association for Geo-Referencing Using Lane Markings [44.4879068879732]
This paper presents a complete pipeline for resolving ambiguities during the data association. Its core is a robust self-tuning data association that adapts the search area depending on the entropy of the measurements. We evaluate our method on real data from urban and rural scenarios around the city of Karlsruhe in Germany.
arXiv Detail & Related papers (2022-07-28T12:29:39Z)
Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations. In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z)
Domain-invariant Similarity Activation Map Contrastive Learning for Retrieval-based Long-term Visual Localization [30.203072945001136]
In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation. And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy. Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset. Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision.
arXiv Detail & Related papers (2020-09-16T14:43:22Z)
Robust Image Retrieval-based Visual Localization using Kapture [10.249293519246478]
We present a versatile pipeline for visual localization that facilitates the use of different local and global features. We evaluate our methods on eight public datasets where they rank top on all and first on many of them. To foster future research, we release code, models, and all datasets used in this paper in the kapture format open source under a permissive BSD license.
arXiv Detail & Related papers (2020-07-27T21:10:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.