GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization
- URL: http://arxiv.org/abs/2507.10935v1
- Date: Tue, 15 Jul 2025 03:00:15 GMT
- Title: GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization
- Authors: Shaowen Tong, Zimin Xia, Alexandre Alahi, Xuming He, Yujiao Shi,
- Abstract summary: Cross-view localization is crucial for large-scale outdoor applications like autonomous navigation and augmented reality.<n>Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations.<n>We propose GeoDistill, a framework that uses teacher-student learning with Field-of-View (FoV)-based masking.
- Score: 70.65458151146767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-view localization, the task of estimating a camera's 3-degrees-of-freedom (3-DoF) pose by aligning ground-level images with satellite images, is crucial for large-scale outdoor applications like autonomous navigation and augmented reality. Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations. In this work, we propose GeoDistill, a Geometry guided weakly supervised self distillation framework that uses teacher-student learning with Field-of-View (FoV)-based masking to enhance local feature learning for robust cross-view localization. In GeoDistill, the teacher model localizes a panoramic image, while the student model predicts locations from a limited FoV counterpart created by FoV-based masking. By aligning the student's predictions with those of the teacher, the student focuses on key features like lane lines and ignores textureless regions, such as roads. This results in more accurate predictions and reduced uncertainty, regardless of whether the query images are panoramas or limited FoV images. Our experiments show that GeoDistill significantly improves localization performance across different frameworks. Additionally, we introduce a novel orientation estimation network that predicts relative orientation without requiring precise planar position ground truth. GeoDistill provides a scalable and efficient solution for real-world cross-view localization challenges. Code and model can be found at https://github.com/tongshw/GeoDistill.
Related papers
- Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth [56.565405280314884]
This paper focuses on improving the performance of a trained model in a new target area by leveraging only the target-area images without fine GT.
We propose a weakly supervised learning approach based on knowledge self-distillation.
Our approach is validated using two recent state-of-the-art models on two benchmarks.
arXiv Detail & Related papers (2024-06-01T15:58:35Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - Cross-View Visual Geo-Localization for Outdoor Augmented Reality [11.214903134756888]
We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database.
We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation.
Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-28T01:58:03Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - Learning Cross-Scale Visual Representations for Real-Time Image
Geo-Localization [21.375640354558044]
State estimation approaches based on local sensors are drifting-prone for long-range missions as error accumulates.
We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources.
We propose a framework that learns cross-scale visual representations without supervision.
arXiv Detail & Related papers (2021-09-09T08:08:54Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z) - Learning Geocentric Object Pose in Oblique Monocular Images [18.15647135620892]
An object's geocentric pose, defined as the height above ground and orientation with respect to gravity, is a powerful representation of real-world structure for object detection, segmentation, and localization tasks using RGBD images.
We develop an encoding of geocentric pose to address this challenge and train a deep network to compute the representation densely, supervised by publicly available airborne lidar.
We exploit these attributes to rectify oblique images and remove observed object parallax to dramatically improve the accuracy of localization and to enable accurate alignment of multiple images taken from very different oblique viewpoints.
arXiv Detail & Related papers (2020-07-01T20:06:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.