CVLNet: Cross-View Semantic Correspondence Learning for Video-based
Camera Localization
- URL: http://arxiv.org/abs/2208.03660v1
- Date: Sun, 7 Aug 2022 07:35:17 GMT
- Title: CVLNet: Cross-View Semantic Correspondence Learning for Video-based
Camera Localization
- Authors: Yujiao Shi, Xin Yu, Shan Wang, Hongdong Li
- Abstract summary: This paper tackles the problem of Cross-view Video-based camera localization.
We propose estimating the query camera's relative displacement to a satellite image before similarity matching.
Experiments have demonstrated the effectiveness of video-based localization over single image-based localization.
- Score: 89.69214577915959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles the problem of Cross-view Video-based camera Localization
(CVL). The task is to localize a query camera by leveraging information from
its past observations, i.e., a continuous sequence of images observed at
previous time stamps, and matching them to a large overhead-view satellite
image. The critical challenge of this task is to learn a powerful global
feature descriptor for the sequential ground-view images while considering its
domain alignment with reference satellite images. For this purpose, we
introduce CVLNet, which first projects the sequential ground-view images into
an overhead view by exploring the ground-and-overhead geometric correspondences
and then leverages the photo consistency among the projected images to form a
global representation. In this way, the cross-view domain differences are
bridged. Since the reference satellite images are usually pre-cropped and
regularly sampled, there is always a misalignment between the query camera
location and its matching satellite image center. Motivated by this, we propose
estimating the query camera's relative displacement to a satellite image before
similarity matching. In this displacement estimation process, we also consider
the uncertainty of the camera location. For example, a camera is unlikely to be
on top of trees. To evaluate the performance of the proposed method, we collect
satellite images from Google Map for the KITTI dataset and construct a new
cross-view video-based localization benchmark dataset, KITTI-CVL. Extensive
experiments have demonstrated the effectiveness of video-based localization
over single image-based localization and the superiority of each proposed
module over other alternatives.
Related papers
- Weakly-supervised Camera Localization by Ground-to-satellite Image Registration [52.54992898069471]
We propose a weakly supervised learning strategy for ground-to-satellite image registration.
It derives positive and negative satellite images for each ground image.
We also propose a self-supervision strategy for cross-view image relative rotation estimation.
arXiv Detail & Related papers (2024-09-10T12:57:16Z) - A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching [30.324252605889356]
This work addresses the problem of matching a query ground-view image with the corresponding satellite image without GPS data.
This is done by comparing the features from a ground-view image and a satellite one, innovatively leveraging the corresponding latter's segmentation mask through a three-stream Siamese-like network.
The novelty lies in the fusion of satellite images in combination with their semantic segmentation masks, aimed at ensuring that the model can extract useful features and focus on the significant parts of the images.
arXiv Detail & Related papers (2024-04-17T12:13:18Z) - Cross-View Image Sequence Geo-localization [6.555961698070275]
Cross-view geo-localization aims to estimate the GPS location of a query ground-view image.
Recent approaches use panoramic ground-view images to increase the range of visibility.
We present the first cross-view geo-localization method that works on a sequence of limited Field-Of-View images.
arXiv Detail & Related papers (2022-10-25T19:46:18Z) - Visual Cross-View Metric Localization with Dense Uncertainty Estimates [11.76638109321532]
This work addresses visual cross-view metric localization for outdoor robotics.
Given a ground-level color image and a satellite patch that contains the local surroundings, the task is to identify the location of the ground camera within the satellite patch.
We devise a novel network architecture with denser satellite descriptors, similarity matching at the bottleneck, and a dense spatial distribution as output to capture multi-modal localization ambiguities.
arXiv Detail & Related papers (2022-08-17T20:12:23Z) - Satellite Image Based Cross-view Localization for Autonomous Vehicle [59.72040418584396]
This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy.
Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view.
arXiv Detail & Related papers (2022-07-27T13:16:39Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - Where am I looking at? Joint Location and Orientation Estimation by
Cross-View Matching [95.64702426906466]
Cross-view geo-localization is a problem given a large-scale database of geo-tagged aerial images.
Knowing orientation between ground and aerial images can significantly reduce matching ambiguity between these two views.
We design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization.
arXiv Detail & Related papers (2020-05-08T05:21:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.