InterKey: Cross-modal Intersection Keypoints for Global Localization on OpenStreetMap
- URL: http://arxiv.org/abs/2509.13857v2
- Date: Mon, 29 Sep 2025 04:12:09 GMT
- Title: InterKey: Cross-modal Intersection Keypoints for Global Localization on OpenStreetMap
- Authors: Nguyen Hoang Khoi Tran, Julie Stephany Berrio, Mao Shan, Stewart Worrall,
- Abstract summary: OpenStreetMap (OSM) offers a free and globally available alternative, but its coarse abstraction poses challenges for matching with sensor data.<n>We propose InterKey, a cross-modal framework that leverages road intersections as distinctive landmarks for global localization.<n>Our method constructs compact binary descriptors by jointly encoding road and building imprints from point clouds and OSM.
- Score: 7.975038003192725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reliable global localization is critical for autonomous vehicles, especially in environments where GNSS is degraded or unavailable, such as urban canyons and tunnels. Although high-definition (HD) maps provide accurate priors, the cost of data collection, map construction, and maintenance limits scalability. OpenStreetMap (OSM) offers a free and globally available alternative, but its coarse abstraction poses challenges for matching with sensor data. We propose InterKey, a cross-modal framework that leverages road intersections as distinctive landmarks for global localization. Our method constructs compact binary descriptors by jointly encoding road and building imprints from point clouds and OSM. To bridge modality gaps, we introduce discrepancy mitigation, orientation determination, and area-equalized sampling strategies, enabling robust cross-modal matching. Experiments on the KITTI dataset demonstrate that InterKey achieves state-of-the-art accuracy, outperforming recent baselines by a large margin. The framework generalizes to sensors that can produce dense structural point clouds, offering a scalable and cost-effective solution for robust vehicle localization.
Related papers
- UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction [83.48950950780554]
Building extraction from remote sensing images is a challenging task due to the complex structure variations of buildings.<n>Existing methods employ convolutional or self-attention blocks to capture the multi-scale features in the segmentation models.<n>We present an Uncertainty-Aggregated Global-Local Fusion Network (UAGLNet) to exploit high-quality global-local visual semantics.
arXiv Detail & Related papers (2025-12-15T02:59:16Z) - Bridging the Gap Between Sparsity and Redundancy: A Dual-Decoding Framework with Global Context for Map Inference [1.6891753537675143]
We propose DGMap, a dual-decoding framework with global context awareness.<n>By integrating global semantic context with local geometric features, DGMap improves keypoint detection accuracy.<n>Global Context-aware Relation Prediction module suppresses false connections in dense-trajectory regions.
arXiv Detail & Related papers (2025-09-15T09:31:38Z) - InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method [10.561470037080177]
We present a novel LiDAR-based method for online vehicle-centric intersection localization.<n>We detect intersection candidates in a bird's eye view (BEV) representation formed by concatenating semantic road scans.<n>Experiments on the Semantic KITTITI dataset show that our method outperforms the latest learning-based baseline in accuracy and reliability.
arXiv Detail & Related papers (2025-05-01T13:30:28Z) - SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird's-Eye-View Segmentation [0.0]
SegLocNet is a multimodal-free localization network that achieves precise localization using semantic segmentation.<n>Our method can accurately estimate the ego pose in urban environments without relying on generalization.<n>Our code and pre-trained model will be released publicly.
arXiv Detail & Related papers (2025-02-27T13:34:55Z) - PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments [73.80718037070773]
We present the multi-modal Pedestrian-Focused Scene dataset, rigorously annotated in semi-structured scenes with the format of nuScenes.<n>We also propose a novel Hybrid Multi-Scale Fusion Network (HMFN) to detect pedestrians in densely populated and occluded scenarios.
arXiv Detail & Related papers (2025-02-21T09:57:53Z) - TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes [49.43995864524434]
We propose a novel image-to-point cloud registration (I2P) method, TrafficLoc, in a coarse-tofine matching fashion.<n>To overcome the lack of large-scale real-world intersection datasets, we first introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla.<n>Our TrafficLoc greatly improves the performance over the SOTA I2P methods (up to 86%) on Carla Intersection and generalizes well to real-world data.
arXiv Detail & Related papers (2024-12-13T17:42:53Z) - SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames [3.5047603107971397]
SF-Loc is a lightweight visual mapping and map-aided localization system.<n>In the mapping phase, multi-sensor dense bundle adjustment (MS-DBA) is applied to construct geo-referenced visual structure frames.<n>In the localization phase, coarse-to-fine vision-based localization is performed, in which multi-frame information and the map distribution are fully integrated.
arXiv Detail & Related papers (2024-12-02T13:51:58Z) - Neural Semantic Map-Learning for Autonomous Vehicles [85.8425492858912]
We present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment.
Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field.
We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction.
arXiv Detail & Related papers (2024-10-10T10:10:03Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks.
Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z) - Focus on Local: Detecting Lane Marker from Bottom Up via Key Point [10.617793053931964]
We propose a novel lane marker detection solution, FOLOLane, that focuses on modeling local patterns and achieving prediction of global structures.
Specifically, the CNN models lowcomplexity local patterns with two separate heads, the first one predicts the existence of key points, and the second refines the location of key points in the local range and correlates key points of the same lane line.
arXiv Detail & Related papers (2021-05-28T08:59:14Z) - Zero-Shot Multi-View Indoor Localization via Graph Location Networks [66.05980368549928]
indoor localization is a fundamental problem in location-based applications.
We propose a novel neural network based architecture Graph Location Networks (GLN) to perform infrastructure-free, multi-view image based indoor localization.
GLN makes location predictions based on robust location representations extracted from images through message-passing networks.
We introduce a novel zero-shot indoor localization setting and tackle it by extending the proposed GLN to a dedicated zero-shot version.
arXiv Detail & Related papers (2020-08-06T07:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.