BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation
- URL: http://arxiv.org/abs/2312.15363v2
- Date: Mon, 23 Sep 2024 18:10:25 GMT
- Title: BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation
- Authors: Tavis Shore, Simon Hadfield, Oscar Mendez,
- Abstract summary: Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints.
We propose BEV-CV, an approach introducing two key novelties with a focus on improving the real-world viability of cross-view geo-localisation.
- Score: 15.324623975476348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effectively in GNSS-denied environments. Current research employs a variety of techniques to reduce the domain gap such as applying polar transforms to aerial images or synthesising between perspectives. However, these approaches generally rely on having a 360{\deg} field of view, limiting real-world feasibility. We propose BEV-CV, an approach introducing two key novelties with a focus on improving the real-world viability of cross-view geo-localisation. Firstly bringing ground-level images into a semantic Birds-Eye-View before matching embeddings, allowing for direct comparison with aerial image representations. Secondly, we adapt datasets into application realistic format - limited Field-of-View images aligned to vehicle direction. BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of 70{\deg} crops of CVUSA and CVACT by 23% and 24% respectively. Also decreasing computational requirements by reducing floating point operations to below previous works, and decreasing embedding dimensionality by 33% - together allowing for faster localisation capabilities.
Related papers
- Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network [12.692812966686066]
Cross-view geolocalization identifies the geographic location of street view images by matching them with a georeferenced satellite database.
We propose a new approach for cross-view image geo-localization, i.e., the Panorama-BEV Co-Retrieval Network.
arXiv Detail & Related papers (2024-08-10T08:03:58Z) - Cross-domain and Cross-dimension Learning for Image-to-Graph
Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model.
We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers.
We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z) - C-BEV: Contrastive Bird's Eye View Training for Cross-View Image
Retrieval and 3-DoF Pose Estimation [27.870926763424848]
We propose a novel trainable retrieval architecture that uses bird's eye view (BEV) maps rather than vectors as embedding representation.
Our method C-BEV surpasses the state-of-the-art on the retrieval task on multiple datasets by a large margin.
arXiv Detail & Related papers (2023-12-13T11:14:57Z) - FocusTune: Tuning Visual Localization through Focus-Guided Sampling [61.79440120153917]
FocusTune is a focus-guided sampling technique to improve the performance of visual localization algorithms.
We demonstrate that FocusTune both improves or matches state-of-the-art performance whilst keeping ACE's appealing low storage and compute requirements.
This combination of high performance and low compute and storage requirements is particularly promising for applications in areas like mobile robotics and augmented reality.
arXiv Detail & Related papers (2023-11-06T04:58:47Z) - View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Uncertainty-aware Vision-based Metric Cross-view Geolocalization [25.87104194833264]
We present an end-to-end differentiable model that uses the ground and aerial images to predict a probability distribution over possible vehicle poses.
We improve the previous state-of-the-art by a large margin even without ground or aerial data from the test region.
arXiv Detail & Related papers (2022-11-22T10:23:20Z) - Wide-Area Geolocalization with a Limited Field of View Camera [33.34809839268686]
Cross-view geolocalization, a supplement or replacement for GPS, localizes an agent within a search area by matching images taken from a ground-view camera to overhead images taken from satellites or aircraft.
ReWAG is a neural network and particle filter system that is able to globally localize a mobile agent in a GPS-denied environment with only odometry and a 90 degree FOV camera.
arXiv Detail & Related papers (2022-09-23T20:59:26Z) - GitNet: Geometric Prior-based Transformation for Birds-Eye-View
Segmentation [105.19949897812494]
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving.
We present a novel two-stage Geometry Prior-based Transformation framework named GitNet.
arXiv Detail & Related papers (2022-04-16T06:46:45Z) - City-wide Street-to-Satellite Image Geolocalization of a Mobile Ground
Agent [38.140216125792755]
Cross-view image geolocalization provides an estimate of an agent's global position by matching a local ground image to an overhead satellite image without the need for GPS.
Our approach, called Wide-Area Geolocalization (WAG), combines a neural network with a particle filter to achieve global position estimates for agents moving in GPS-denied environments.
WAG achieves position estimation accuracies on the order of 20 meters, a 98% reduction compared to a baseline training and weighting approach.
arXiv Detail & Related papers (2022-03-10T19:54:12Z) - Where am I looking at? Joint Location and Orientation Estimation by
Cross-View Matching [95.64702426906466]
Cross-view geo-localization is a problem given a large-scale database of geo-tagged aerial images.
Knowing orientation between ground and aerial images can significantly reduce matching ambiguity between these two views.
We design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization.
arXiv Detail & Related papers (2020-05-08T05:21:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.