BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization
- URL: http://arxiv.org/abs/2502.09080v1
- Date: Thu, 13 Feb 2025 08:54:04 GMT
- Title: BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization
- Authors: Qiwei Wang, Shaoxun Wu, Yujiao Shi,
- Abstract summary: This paper addresses the problem of weakly supervised cross-view localization.
The goal is to estimate the pose of a ground camera relative to a satellite image with noisy ground truth annotations.
We propose BevSplat, a novel method that resolves height ambiguity by using feature-based Gaussian primitives.
- Score: 11.50186721264038
- License:
- Abstract: This paper addresses the problem of weakly supervised cross-view localization, where the goal is to estimate the pose of a ground camera relative to a satellite image with noisy ground truth annotations. A common approach to bridge the cross-view domain gap for pose estimation is Bird's-Eye View (BEV) synthesis. However, existing methods struggle with height ambiguity due to the lack of depth information in ground images and satellite height maps. Previous solutions either assume a flat ground plane or rely on complex models, such as cross-view transformers. We propose BevSplat, a novel method that resolves height ambiguity by using feature-based Gaussian primitives. Each pixel in the ground image is represented by a 3D Gaussian with semantic and spatial features, which are synthesized into a BEV feature map for relative pose estimation. Additionally, to address challenges with panoramic query images, we introduce an icosphere-based supervision strategy for the Gaussian primitives. We validate our method on the widely used KITTI and VIGOR datasets, which include both pinhole and panoramic query images. Experimental results show that BevSplat significantly improves localization accuracy over prior approaches.
Related papers
- Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering [31.716967688739036]
Unmanned Aerial Vehicle (UAV) Cross-View Geo-Localization (CVGL) presents significant challenges.
Existing methods rely on the supervision of labeled datasets to extract viewpoint-invariant features for cross-view retrieval.
We propose an unsupervised solution that lifts the scene representation to 3d space from UAV observations for satellite image generation.
arXiv Detail & Related papers (2024-11-22T09:22:39Z) - No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - HeightLane: BEV Heightmap guided 3D Lane Detection [6.940660861207046]
Accurate 3D lane detection from monocular images presents significant challenges due to depth ambiguity and imperfect ground modeling.
Our study introduces HeightLane, an innovative method that predicts a height map from monocular images by creating anchors based on a multi-slope assumption.
HeightLane achieves state-of-the-art performance in terms of F-score, highlighting its potential in real-world applications.
arXiv Detail & Related papers (2024-08-15T17:14:57Z) - C-BEV: Contrastive Bird's Eye View Training for Cross-View Image
Retrieval and 3-DoF Pose Estimation [27.870926763424848]
We propose a novel trainable retrieval architecture that uses bird's eye view (BEV) maps rather than vectors as embedding representation.
Our method C-BEV surpasses the state-of-the-art on the retrieval task on multiple datasets by a large margin.
arXiv Detail & Related papers (2023-12-13T11:14:57Z) - Learning Dense Flow Field for Highly-accurate Cross-view Camera
Localization [15.89357790711828]
This paper addresses the problem of estimating the 3-DoF camera pose for a ground-level image with respect to a satellite image.
We propose a novel end-to-end approach that leverages the learning of dense pixel-wise flow fields in pairs of ground and satellite images.
Our approach reduces the median localization error by 89%, 19%, 80% and 35% on the KITTI, Ford multi-AV, VIGOR and Oxford RobotCar datasets.
arXiv Detail & Related papers (2023-09-27T10:26:26Z) - View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D
Object Detection [92.75961303269548]
The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD)
We propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at one go.
Our GPENet can outperform other methods and achieve state-of-the-art performance, well demonstrating the effectiveness and the superiority of the proposed approach.
arXiv Detail & Related papers (2022-11-03T02:21:35Z) - Satellite Image Based Cross-view Localization for Autonomous Vehicle [59.72040418584396]
This paper shows that by using an off-the-shelf high-definition satellite image as a ready-to-use map, we are able to achieve cross-view vehicle localization up to a satisfactory accuracy.
Our method is validated on KITTI and Ford Multi-AV Seasonal datasets as ground view and Google Maps as the satellite view.
arXiv Detail & Related papers (2022-07-27T13:16:39Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View
Images [4.449481309681663]
We present the first end-to-end learning approach for directly predicting dense panoptic segmentation maps in the Bird's-Eye-View (BEV) maps.
Our architecture follows the top-down paradigm and incorporates a novel dense transformer module.
We derive a mathematical formulation for the sensitivity of the FV-BEV transformation which allows us to intelligently weight pixels in the BEV space.
arXiv Detail & Related papers (2021-08-06T17:59:11Z) - Where am I looking at? Joint Location and Orientation Estimation by
Cross-View Matching [95.64702426906466]
Cross-view geo-localization is a problem given a large-scale database of geo-tagged aerial images.
Knowing orientation between ground and aerial images can significantly reduce matching ambiguity between these two views.
We design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization.
arXiv Detail & Related papers (2020-05-08T05:21:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.