Related papers: S-BEVLoc: BEV-based Self-supervised Framework for Large-scale LiDAR Global Localization

S-BEVLoc: BEV-based Self-supervised Framework for Large-scale LiDAR Global Localization

URL: http://arxiv.org/abs/2509.09110v1
Date: Thu, 11 Sep 2025 02:48:06 GMT
Title: S-BEVLoc: BEV-based Self-supervised Framework for Large-scale LiDAR Global Localization
Authors: Chenghao Zhang, Lun Luo, Si-Yuan Cao, Xiaokai Bai, Yuncheng Jin, Zhu Yu, Beinan Yu, Yisen Wang, Hui-Liang Shen,
Abstract summary: S-BEVLoc is a novel framework based on bird's-eye view (BEV) for LiDAR global localization.<n>We construct training triplets from single BEV images by leveraging the known geographic distances between keypoint-centered BEV patches.<n>We show that S-BEVLoc achieves state-of-the-art performance in place recognition, loop closure, and global localization tasks.
Score: 34.79060534627474
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LiDAR-based global localization is an essential component of simultaneous localization and mapping (SLAM), which helps loop closure and re-localization. Current approaches rely on ground-truth poses obtained from GPS or SLAM odometry to supervise network training. Despite the great success of these supervised approaches, substantial cost and effort are required for high-precision ground-truth pose acquisition. In this work, we propose S-BEVLoc, a novel self-supervised framework based on bird's-eye view (BEV) for LiDAR global localization, which eliminates the need for ground-truth poses and is highly scalable. We construct training triplets from single BEV images by leveraging the known geographic distances between keypoint-centered BEV patches. Convolutional neural network (CNN) is used to extract local features, and NetVLAD is employed to aggregate global descriptors. Moreover, we introduce SoftCos loss to enhance learning from the generated triplets. Experimental results on the large-scale KITTI and NCLT datasets show that S-BEVLoc achieves state-of-the-art performance in place recognition, loop closure, and global localization tasks, while offering scalability that would require extra effort for supervised approaches.

Related papers

TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation [70.23578202012048]
Vision-Language Navigation (VLN) presents a unique challenge for Large Vision-Language Models (VLMs) due to their inherent architectural mismatch.<n>We propose TagaVLM (Topology-Aware Global Action reasoning), an end-to-end framework that explicitly injects topological structures into the VLM backbone.<n>To enhance topological node information, an Interleaved Navigation Prompt strengthens node-level visual-text alignment.<n>With the embedded topological graph, the model is capable of global action reasoning, allowing for robust path correction.
arXiv Detail & Related papers (2026-03-03T13:28:07Z)
VVLoc: Prior-free 3-DoF Vehicle Visual Localization [6.151313455860856]
We propose a unified pipeline that employs a single neural network to concurrently achieve topological and metric vehicle localization using multi-camera system.<n> VVLoc first evaluates the geo-proximity between visual observations, then estimates their relative metric poses using a matching strategy, while also providing a confidence measure.<n>We evaluate VVLoc not only on the publicly available datasets, but also on a more challenging self-collected dataset.
arXiv Detail & Related papers (2026-01-31T16:37:30Z)
Generative MIMO Beam Map Construction for Location Recovery and Beam Tracking [67.65578956523403]
This paper proposes a generative framework to recover location labels directly from sparse channel state information (CSI) measurements.<n>Instead of directly storing raw CSI, we learn a compact low-dimensional radio map embedding and leverage a generative model to reconstruct the high-dimensional CSI.<n> Numerical experiments demonstrate that the proposed model can improve localization accuracy by over 30% and achieve a 20% capacity gain in non-line-of-sight (NLOS) scenarios.
arXiv Detail & Related papers (2025-11-21T07:25:49Z)
BEVDiffLoc: End-to-End LiDAR Global Localization in BEV View based on Diffusion Model [8.720833232645155]
Bird's-Eye-View (BEV) image is one of the most widely adopted data representations in autonomous driving.<n>We propose BEVDiffLoc, a novel framework that formulates LiDAR localization as a conditional generation of poses.
arXiv Detail & Related papers (2025-03-14T13:17:43Z)
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction [80.67150791183126]
Pre-trained vision-language models (VLMs) have demonstrated impressive zero-shot recognition capability, but still underperform in dense prediction tasks.<n>We propose DenseVLM, a framework designed to learn unbiased region-language alignment from powerful pre-trained VLM representations.<n>We show that DenseVLM can directly replace the original VLM in open-vocabulary object detection and image segmentation methods.
arXiv Detail & Related papers (2024-12-09T06:34:23Z)
GLRT-Based Metric Learning for Remote Sensing Object Retrieval [19.210692452537007]
Existing CBRSOR methods neglect the utilization of global statistical information during both training and test stages. Inspired by the Neyman-Pearson theorem, we propose a generalized likelihood ratio test-based metric learning (GLRTML) approach.
arXiv Detail & Related papers (2024-10-08T07:53:30Z)
RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning [20.688641105430467]
Global localization is crucial in autonomous driving and robotics applications when GPS signals are unreliable. Most approaches achieve global localization by sequential place recognition (PR) and pose estimation (PE) We introduce a new paradigm, PR-by-PE localization, which bypasses the need for separate place recognition by directly deriving it from pose estimation. We propose RING#, an end-to-end PR-by-PE localization network that operates in the bird's-eye-view (BEV) space, compatible with both vision and LiDAR sensors.
arXiv Detail & Related papers (2024-08-30T18:42:53Z)
Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization [81.32266996009575]
In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima. We propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side.
arXiv Detail & Related papers (2024-05-29T08:46:21Z)
Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.<n>Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z)
Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels [52.50670006414656]
We employ CLIP, a large-scale pre-trained vision-language model, for knowledge distillation on multiple levels. To train our model, CLIP is utilized to generate HOI scores for both global images and local union regions. The model achieves strong performance, which is even comparable with some fully-supervised and weakly-supervised methods.
arXiv Detail & Related papers (2023-09-10T16:27:54Z)
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention [100.81495948184649]
We present Perceiver-VL, a vision-and-language framework that efficiently handles high-dimensional multimodal inputs such as long videos and text. Our framework scales with linear complexity, in contrast to the quadratic complexity of self-attention used in many state-of-the-art transformer-based models.
arXiv Detail & Related papers (2022-11-21T18:22:39Z)
SphereVLAD++: Attention-based and Signal-enhanced Viewpoint Invariant Descriptor [6.326554177747699]
We develop SphereVLAD++, an attention-enhanced viewpoint invariant place recognition method. We show that SphereVLAD++ outperforms all relative state-of-the-art 3D place recognition methods under small or even totally reversed viewpoint differences.
arXiv Detail & Related papers (2022-07-06T20:32:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.