VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
- URL: http://arxiv.org/abs/2011.12172v2
- Date: Mon, 22 Mar 2021 04:01:54 GMT
- Title: VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
- Authors: Sijie Zhu and Taojiannan Yang and Chen Chen
- Abstract summary: Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view.
Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets.
We propose a new large-scale benchmark -- VIGOR -- for cross-View Image Geo-localization beyond One-to-one Retrieval.
- Score: 19.239311087570318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-view image geo-localization aims to determine the locations of
street-view query images by matching with GPS-tagged reference images from
aerial view. Recent works have achieved surprisingly high retrieval accuracy on
city-scale datasets. However, these results rely on the assumption that there
exists a reference image exactly centered at the location of any query image,
which is not applicable for practical scenarios. In this paper, we redefine
this problem with a more realistic assumption that the query image can be
arbitrary in the area of interest and the reference images are captured before
the queries emerge. This assumption breaks the one-to-one retrieval setting of
existing datasets as the queries and reference images are not perfectly aligned
pairs, and there may be multiple reference images covering one query location.
To bridge the gap between this realistic setting and existing datasets, we
propose a new large-scale benchmark -- VIGOR -- for cross-View Image
Geo-localization beyond One-to-one Retrieval. We benchmark existing
state-of-the-art methods and propose a novel end-to-end framework to localize
the query in a coarse-to-fine manner. Apart from the image-level retrieval
accuracy, we also evaluate the localization accuracy in terms of the actual
distance (meters) using the raw GPS data. Extensive experiments are conducted
under different application scenarios to validate the effectiveness of the
proposed method. The results indicate that cross-view geo-localization in this
realistic setting is still challenging, fostering new research in this
direction. Our dataset and code will be released at
\url{https://github.com/Jeff-Zilence/VIGOR}
Related papers
- GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - Are Local Features All You Need for Cross-Domain Visual Place
Recognition? [13.519413608607781]
Visual Place Recognition aims to predict the coordinates of an image based solely on visual clues.
Despite recent advances, recognizing the same place when the query comes from a significantly different distribution is still a major hurdle for state of the art retrieval methods.
In this work we explore whether re-ranking methods based on spatial verification can tackle these challenges.
arXiv Detail & Related papers (2023-04-12T14:46:57Z) - Cross-View Visual Geo-Localization for Outdoor Augmented Reality [11.214903134756888]
We address the problem of geo-pose estimation by cross-view matching of query ground images to a geo-referenced aerial satellite image database.
We propose a new transformer neural network-based model and a modified triplet ranking loss for joint location and orientation estimation.
Experiments on several benchmark cross-view geo-localization datasets show that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-28T01:58:03Z) - Where We Are and What We're Looking At: Query Based Worldwide Image
Geo-localization Using Hierarchies and Scenes [53.53712888703834]
We introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels.
We achieve state of the art street level accuracy on 4 standard geo-localization datasets.
arXiv Detail & Related papers (2023-03-07T21:47:58Z) - Map-free Visual Relocalization: Metric Pose Relative to a Single Image [21.28513803531557]
We propose Map-free Relocalization, using only one photo of a scene to enable instant, metric scaled relocalization.
Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability.
We have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide.
arXiv Detail & Related papers (2022-10-11T14:49:49Z) - CVLNet: Cross-View Semantic Correspondence Learning for Video-based
Camera Localization [89.69214577915959]
This paper tackles the problem of Cross-view Video-based camera localization.
We propose estimating the query camera's relative displacement to a satellite image before similarity matching.
Experiments have demonstrated the effectiveness of video-based localization over single image-based localization.
arXiv Detail & Related papers (2022-08-07T07:35:17Z) - GAMa: Cross-view Video Geo-localization [68.33955764543465]
We focus on ground videos instead of images which provides contextual cues.
At clip-level, a short video clip is matched with corresponding aerial image and is later used to get video-level geo-localization of a long video.
Our proposed method achieves a Top-1 recall rate of 19.4% and 45.1% @1.0mile.
arXiv Detail & Related papers (2022-07-06T04:25:51Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - Deep Metric Learning for Ground Images [4.864819846886142]
We deal with the initial localization task, in which we have no prior knowledge about the current robot positioning.
We propose a deep metric learning approach that retrieves the most similar reference images to the query image.
In contrast to existing approaches to image retrieval for ground images, our approach achieves significantly better recall performance and improves the localization performance of a state-of-the-art ground texture based localization method.
arXiv Detail & Related papers (2021-09-03T14:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.