Related papers: BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation

BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation

URL: http://arxiv.org/abs/2312.15363v1
Date: Sat, 23 Dec 2023 22:20:45 GMT
Title: BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation
Authors: Tavis Shore, Simon Hadfield, Oscar Mendez
Abstract summary: Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. We propose BEV-CV, an approach which introduces two key novelties. We introduce the use of a Normalised Temperature-scaled Cross Entropy Loss to the sub-field, achieving faster convergence than with the standard triplet loss.
Score: 17.223341593229716
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effectively in environments where GPS signals are unavailable. Current research employs a variety of techniques to reduce the domain gap such as applying polar transforms to aerial images or synthesising between perspectives. However, these approaches generally rely on having a 360{\deg} field of view, limiting real-world feasibility. We propose BEV-CV, an approach which introduces two key novelties. Firstly we bring ground-level images into a semantic Birds-Eye-View before matching embeddings, allowing for direct comparison with aerial segmentation representations. Secondly, we introduce the use of a Normalised Temperature-scaled Cross Entropy Loss to the sub-field, achieving faster convergence than with the standard triplet loss. BEV-CV achieves state-of-the-art recall accuracies, improving feature extraction Top-1 rates by more than 300%, and Top-1% rates by approximately 150% for 70{\deg} crops, and for orientation-aware application we achieve a 35% Top-1 accuracy increase with 70{\deg} crops.

Related papers

FG$^2$: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching [69.81167130510333]
We propose a novel fine-grained cross-view localization method that estimates the 3 Degrees of Freedom pose of a ground-level image in an aerial image of the surroundings. The pose is estimated by aligning a point plane generated from the ground image with a point plane sampled from the aerial image. Compared to the previous state-of-the-art, our method reduces the mean localization error by 28% on the VIGOR cross-area test set.
arXiv Detail & Related papers (2025-03-24T14:34:20Z)
BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization [11.50186721264038]
This paper addresses the problem of weakly supervised cross-view localization. The goal is to estimate the pose of a ground camera relative to a satellite image with noisy ground truth annotations. We propose BevSplat, a novel method that resolves height ambiguity by using feature-based Gaussian primitives.
arXiv Detail & Related papers (2025-02-13T08:54:04Z)
Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering [31.716967688739036]
Unmanned Aerial Vehicle (UAV) Cross-View Geo-Localization (CVGL) presents significant challenges. Existing methods rely on the supervision of labeled datasets to extract viewpoint-invariant features for cross-view retrieval. We propose an unsupervised solution that lifts the scene representation to 3d space from UAV observations for satellite image generation.
arXiv Detail & Related papers (2024-11-22T09:22:39Z)
Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network [12.692812966686066]
Cross-view geolocalization identifies the geographic location of street view images by matching them with a georeferenced satellite database. We propose a new approach for cross-view image geo-localization, i.e., the Panorama-BEV Co-Retrieval Network.
arXiv Detail & Related papers (2024-08-10T08:03:58Z)
Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model. We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z)
C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation [27.870926763424848]
We propose a novel trainable retrieval architecture that uses bird's eye view (BEV) maps rather than vectors as embedding representation. Our method C-BEV surpasses the state-of-the-art on the retrieval task on multiple datasets by a large margin.
arXiv Detail & Related papers (2023-12-13T11:14:57Z)
FocusTune: Tuning Visual Localization through Focus-Guided Sampling [61.79440120153917]
FocusTune is a focus-guided sampling technique to improve the performance of visual localization algorithms. We demonstrate that FocusTune both improves or matches state-of-the-art performance whilst keeping ACE's appealing low storage and compute requirements. This combination of high performance and low compute and storage requirements is particularly promising for applications in areas like mobile robotics and augmented reality.
arXiv Detail & Related papers (2023-11-06T04:58:47Z)
View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics. The proposed method addresses limitations in existing cross-view localization methods. It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z)
Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z)
Uncertainty-aware Vision-based Metric Cross-view Geolocalization [25.87104194833264]
We present an end-to-end differentiable model that uses the ground and aerial images to predict a probability distribution over possible vehicle poses. We improve the previous state-of-the-art by a large margin even without ground or aerial data from the test region.
arXiv Detail & Related papers (2022-11-22T10:23:20Z)
GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving. We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z)
City-wide Street-to-Satellite Image Geolocalization of a Mobile Ground Agent [38.140216125792755]
Cross-view image geolocalization provides an estimate of an agent's global position by matching a local ground image to an overhead satellite image without the need for GPS. Our approach, called Wide-Area Geolocalization (WAG), combines a neural network with a particle filter to achieve global position estimates for agents moving in GPS-denied environments. WAG achieves position estimation accuracies on the order of 20 meters, a 98% reduction compared to a baseline training and weighting approach.
arXiv Detail & Related papers (2022-03-10T19:54:12Z)
Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching [95.64702426906466]
Cross-view geo-localization is a problem given a large-scale database of geo-tagged aerial images. Knowing orientation between ground and aerial images can significantly reduce matching ambiguity between these two views. We design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization.
arXiv Detail & Related papers (2020-05-08T05:21:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.