Learning Dense Flow Field for Highly-accurate Cross-view Camera
Localization
- URL: http://arxiv.org/abs/2309.15556v2
- Date: Wed, 27 Dec 2023 13:31:34 GMT
- Title: Learning Dense Flow Field for Highly-accurate Cross-view Camera
Localization
- Authors: Zhenbo Song, Xianghui Ze, Jianfeng Lu, Yujiao Shi
- Abstract summary: This paper addresses the problem of estimating the 3-DoF camera pose for a ground-level image with respect to a satellite image.
We propose a novel end-to-end approach that leverages the learning of dense pixel-wise flow fields in pairs of ground and satellite images.
Our approach reduces the median localization error by 89%, 19%, 80% and 35% on the KITTI, Ford multi-AV, VIGOR and Oxford RobotCar datasets.
- Score: 15.89357790711828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of estimating the 3-DoF camera pose for a
ground-level image with respect to a satellite image that encompasses the local
surroundings. We propose a novel end-to-end approach that leverages the
learning of dense pixel-wise flow fields in pairs of ground and satellite
images to calculate the camera pose. Our approach differs from existing methods
by constructing the feature metric at the pixel level, enabling full-image
supervision for learning distinctive geometric configurations and visual
appearances across views. Specifically, our method employs two distinct
convolution networks for ground and satellite feature extraction. Then, we
project the ground feature map to the bird's eye view (BEV) using a fixed
camera height assumption to achieve preliminary geometric alignment. To further
establish content association between the BEV and satellite features, we
introduce a residual convolution block to refine the projected BEV feature.
Optical flow estimation is performed on the refined BEV feature map and the
satellite feature map using flow decoder networks based on RAFT. After
obtaining dense flow correspondences, we apply the least square method to
filter matching inliers and regress the ground camera pose. Extensive
experiments demonstrate significant improvements compared to state-of-the-art
methods. Notably, our approach reduces the median localization error by 89%,
19%, 80% and 35% on the KITTI, Ford multi-AV, VIGOR and Oxford RobotCar
datasets, respectively.
Related papers
- BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization [11.50186721264038]
This paper addresses the problem of weakly supervised cross-view localization.
The goal is to estimate the pose of a ground camera relative to a satellite image with noisy ground truth annotations.
We propose BevSplat, a novel method that resolves height ambiguity by using feature-based Gaussian primitives.
arXiv Detail & Related papers (2025-02-13T08:54:04Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Enhanced Stable View Synthesis [86.69338893753886]
We introduce an approach to enhance the novel view synthesis from images taken from a freely moving camera.
The introduced approach focuses on outdoor scenes where recovering accurate geometric scaffold and camera pose is challenging.
arXiv Detail & Related papers (2023-03-30T01:53:14Z) - Satellite Image Based Cross-view Localization for Autonomous Vehicle [59.72040418584396]
This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy.
Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view.
arXiv Detail & Related papers (2022-07-27T13:16:39Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View
Images [4.449481309681663]
We present the first end-to-end learning approach for directly predicting dense panoptic segmentation maps in the Bird's-Eye-View (BEV) maps.
Our architecture follows the top-down paradigm and incorporates a novel dense transformer module.
We derive a mathematical formulation for the sensitivity of the FV-BEV transformation which allows us to intelligently weight pixels in the BEV space.
arXiv Detail & Related papers (2021-08-06T17:59:11Z) - Where am I looking at? Joint Location and Orientation Estimation by
Cross-View Matching [95.64702426906466]
Cross-view geo-localization is a problem given a large-scale database of geo-tagged aerial images.
Knowing orientation between ground and aerial images can significantly reduce matching ambiguity between these two views.
We design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization.
arXiv Detail & Related papers (2020-05-08T05:21:16Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.