VVLoc: Prior-free 3-DoF Vehicle Visual Localization
- URL: http://arxiv.org/abs/2602.00810v1
- Date: Sat, 31 Jan 2026 16:37:30 GMT
- Title: VVLoc: Prior-free 3-DoF Vehicle Visual Localization
- Authors: Ze Huang, Zhongyang Xiao, Mingliang Song, Longan Yang, Hongyuan Yuan, Li Sun,
- Abstract summary: We propose a unified pipeline that employs a single neural network to concurrently achieve topological and metric vehicle localization using multi-camera system.<n> VVLoc first evaluates the geo-proximity between visual observations, then estimates their relative metric poses using a matching strategy, while also providing a confidence measure.<n>We evaluate VVLoc not only on the publicly available datasets, but also on a more challenging self-collected dataset.
- Score: 6.151313455860856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Localization is a critical technology in autonomous driving, encompassing both topological localization, which identifies the most similar map keyframe to the current observation, and metric localization, which provides precise spatial coordinates. Conventional methods typically address these tasks independently, rely on single-camera setups, and often require additional 3D semantic or pose priors, while lacking mechanisms to quantify the confidence of localization results, making them less feasible for real industrial applications. In this paper, we propose VVLoc, a unified pipeline that employs a single neural network to concurrently achieve topological and metric vehicle localization using multi-camera system. VVLoc first evaluates the geo-proximity between visual observations, then estimates their relative metric poses using a matching strategy, while also providing a confidence measure. Additionally, the training process for VVLoc is highly efficient, requiring only pairs of visual data and corresponding ground-truth poses, eliminating the need for complex supplementary data. We evaluate VVLoc not only on the publicly available datasets, but also on a more challenging self-collected dataset, demonstrating its ability to deliver state-of-the-art localization accuracy across a wide range of localization tasks.
Related papers
- TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation [70.23578202012048]
Vision-Language Navigation (VLN) presents a unique challenge for Large Vision-Language Models (VLMs) due to their inherent architectural mismatch.<n>We propose TagaVLM (Topology-Aware Global Action reasoning), an end-to-end framework that explicitly injects topological structures into the VLM backbone.<n>To enhance topological node information, an Interleaved Navigation Prompt strengthens node-level visual-text alignment.<n>With the embedded topological graph, the model is capable of global action reasoning, allowing for robust path correction.
arXiv Detail & Related papers (2026-03-03T13:28:07Z) - Automatic Map Density Selection for Locally-Performant Visual Place Recognition [16.893170921147608]
We propose a dynamic VPR mapping approach that uses pairs of reference traverses from the target environment to automatically select an appropriate map density.<n>We show that our system consistently achieves or exceeds the specified local recall level over at least the user-specified proportion of the environment.
arXiv Detail & Related papers (2026-02-25T01:03:34Z) - Generative MIMO Beam Map Construction for Location Recovery and Beam Tracking [67.65578956523403]
This paper proposes a generative framework to recover location labels directly from sparse channel state information (CSI) measurements.<n>Instead of directly storing raw CSI, we learn a compact low-dimensional radio map embedding and leverage a generative model to reconstruct the high-dimensional CSI.<n> Numerical experiments demonstrate that the proposed model can improve localization accuracy by over 30% and achieve a 20% capacity gain in non-line-of-sight (NLOS) scenarios.
arXiv Detail & Related papers (2025-11-21T07:25:49Z) - Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization [17.908597896653045]
This paper presents a cross-view UAV localization framework that performs map matching via object detection.<n>In typical pipelines, UAV visual localization is formulated as an image-retrieval problem.<n>Our method achieves strong retrieval and localization performance using a fine-grained, graph-based node-similarity metric.
arXiv Detail & Related papers (2025-11-04T11:25:31Z) - S-BEVLoc: BEV-based Self-supervised Framework for Large-scale LiDAR Global Localization [34.79060534627474]
S-BEVLoc is a novel framework based on bird's-eye view (BEV) for LiDAR global localization.<n>We construct training triplets from single BEV images by leveraging the known geographic distances between keypoint-centered BEV patches.<n>We show that S-BEVLoc achieves state-of-the-art performance in place recognition, loop closure, and global localization tasks.
arXiv Detail & Related papers (2025-09-11T02:48:06Z) - Without Paired Labeled Data: End-to-End Self-Supervised Learning for Drone-view Geo-Localization [20.603433987118837]
Drone-view Geo-Localization (DVGL) aims to achieve accurate localization of drones by retrieving the most relevant GPS-tagged satellite images.<n>Existing methods heavily rely on strictly pre-paired drone-satellite images for supervised learning.<n>We propose a novel end-to-end self-supervised learning method with a shallow backbone network.
arXiv Detail & Related papers (2025-02-17T02:53:08Z) - Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction [80.67150791183126]
Pre-trained vision-language models (VLMs) have demonstrated impressive zero-shot recognition capability, but still underperform in dense prediction tasks.<n>We propose DenseVLM, a framework designed to learn unbiased region-language alignment from powerful pre-trained VLM representations.<n>We show that DenseVLM can directly replace the original VLM in open-vocabulary object detection and image segmentation methods.
arXiv Detail & Related papers (2024-12-09T06:34:23Z) - Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy.
Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z) - UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization [14.87295056434887]
We introduce a large-scale 6-DoF UAV dataset for localization (UAVD4L)
We develop a two-stage 6-DoF localization pipeline (UAVLoc), which consists of offline synthetic data generation and online visual localization.
Results on the new dataset demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-01-11T15:19:21Z) - Self-supervised Learning of Contextualized Local Visual Embeddings [0.0]
Contextualized Local Visual Embeddings (CLoVE) is a self-supervised convolutional-based method that learns representations suited for dense prediction tasks.
We benchmark CLoVE's pre-trained representations on multiple datasets.
CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks.
arXiv Detail & Related papers (2023-10-01T00:13:06Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Unsupervised Metric Relocalization Using Transform Consistency Loss [66.19479868638925]
Training networks to perform metric relocalization traditionally requires accurate image correspondences.
We propose a self-supervised solution, which exploits a key insight: localizing a query image within a map should yield the same absolute pose, regardless of the reference image used for registration.
We evaluate our framework on synthetic and real-world data, showing our approach outperforms other supervised methods when a limited amount of ground-truth information is available.
arXiv Detail & Related papers (2020-11-01T19:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.