Semi-distributed Cross-modal Air-Ground Relative Localization
- URL: http://arxiv.org/abs/2511.06749v1
- Date: Mon, 10 Nov 2025 06:28:31 GMT
- Title: Semi-distributed Cross-modal Air-Ground Relative Localization
- Authors: Weining Lu, Deer Bin, Lian Ma, Ming Ma, Zhihao Ma, Xiangyang Chen, Longfei Wang, Yixiao Feng, Zhouxian Jiang, Yongliang Shi, Bin Liang,
- Abstract summary: Current approaches for robot relative localization are primarily realized in the form of distributed multi-robot SLAM systems.<n>We fully leverage the high capacity of Unmanned Ground Vehicle (UGV) to integrate multiple sensors.<n>In this work, both the UGV and the Unmanned Aerial Vehicle (UAV) independently perform SLAM while extracting deep learning-based keypoints and global descriptors.
- Score: 11.828259485114598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient, accurate, and flexible relative localization is crucial in air-ground collaborative tasks. However, current approaches for robot relative localization are primarily realized in the form of distributed multi-robot SLAM systems with the same sensor configuration, which are tightly coupled with the state estimation of all robots, limiting both flexibility and accuracy. To this end, we fully leverage the high capacity of Unmanned Ground Vehicle (UGV) to integrate multiple sensors, enabling a semi-distributed cross-modal air-ground relative localization framework. In this work, both the UGV and the Unmanned Aerial Vehicle (UAV) independently perform SLAM while extracting deep learning-based keypoints and global descriptors, which decouples the relative localization from the state estimation of all agents. The UGV employs a local Bundle Adjustment (BA) with LiDAR, camera, and an IMU to rapidly obtain accurate relative pose estimates. The BA process adopts sparse keypoint optimization and is divided into two stages: First, optimizing camera poses interpolated from LiDAR-Inertial Odometry (LIO), followed by estimating the relative camera poses between the UGV and UAV. Additionally, we implement an incremental loop closure detection algorithm using deep learning-based descriptors to maintain and retrieve keyframes efficiently. Experimental results demonstrate that our method achieves outstanding performance in both accuracy and efficiency. Unlike traditional multi-robot SLAM approaches that transmit images or point clouds, our method only transmits keypoint pixels and their descriptors, effectively constraining the communication bandwidth under 0.3 Mbps. Codes and data will be publicly available on https://github.com/Ascbpiac/cross-model-relative-localization.git.
Related papers
- Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer [40.778996326009185]
We present the first visual localization framework that performs multi-view spatial integration through an early-fusion mechanism.<n>Our framework is built upon the VGGT backbone, which encodes multi-view 3D geometry.<n>We propose a novel sparse mask attention strategy that reduces computational cost by avoiding the quadratic complexity of global attention.
arXiv Detail & Related papers (2025-12-26T06:12:17Z) - MobileGeo: Exploring Hierarchical Knowledge Distillation for Resource-Efficient Cross-view Drone Geo-Localization [47.16612614191333]
Cross-view geo-localization enables drone localization by matching aerial images to geo-tagged satellite databases.<n>MobileGeo is a mobile-friendly framework designed for efficient on-device CVGL.<n>MobileGeo runs at 251.5 FPS on an NVIDIA AGX Orin edge device, demonstrating its practical viability for real-time on-device drone geo-localization.
arXiv Detail & Related papers (2025-10-26T08:47:20Z) - SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection [7.889379973011702]
We present SpaRC, a novel Sparse fusion transformer for 3D perception that integrates multi-view image semantics with Radar and Camera point features.<n> Empirical evaluations on the nuScenes and TruckScenes benchmarks demonstrate that SpaRC significantly outperforms existing dense BEV-based and sparse query-based detectors.
arXiv Detail & Related papers (2024-11-29T17:17:38Z) - RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning [20.688641105430467]
Global localization is crucial in autonomous driving and robotics applications when GPS signals are unreliable.
Most approaches achieve global localization by sequential place recognition (PR) and pose estimation (PE)
We introduce a new paradigm, PR-by-PE localization, which bypasses the need for separate place recognition by directly deriving it from pose estimation.
We propose RING#, an end-to-end PR-by-PE localization network that operates in the bird's-eye-view (BEV) space, compatible with both vision and LiDAR sensors.
arXiv Detail & Related papers (2024-08-30T18:42:53Z) - UnLoc: A Universal Localization Method for Autonomous Vehicles using
LiDAR, Radar and/or Camera Input [51.150605800173366]
UnLoc is a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions.
Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets.
arXiv Detail & Related papers (2023-07-03T04:10:55Z) - Data-Efficient Collaborative Decentralized Thermal-Inertial Odometry [37.23164397188061]
We propose a system to achieve data-efficient, decentralized state estimation for a team of flying robots.
Each robot can fly independently, and exchange data when possible to refine its state estimate.
Our results show that the proposed method improves by up to 46 % trajectory estimation with respect to an individual-agent approach.
arXiv Detail & Related papers (2022-09-14T12:13:36Z) - Drone Referring Localization: An Efficient Heterogeneous Spatial Feature Interaction Method For UAV Self-Localization [22.94589565476653]
We propose an efficient heterogeneous spatial feature interaction method, termed Drone Referring localization (DRL)
Unlike conventional methods that treat different data sources in isolation, DRL facilitates the learnable interaction of heterogeneous features.
Compared to traditional IR methods, DRL achieves superior localization accuracy (MA@20 +9.4%) while significantly reducing computational time (1/7) and storage overhead (2/3)
arXiv Detail & Related papers (2022-08-13T03:25:50Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Parallel Successive Learning for Dynamic Distributed Model Training over
Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices.
We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions.
Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for
Multi-Robot Systems [92.26462290867963]
Kimera-Multi is the first multi-robot system that is robust and capable of identifying and rejecting incorrect inter and intra-robot loop closures.
We demonstrate Kimera-Multi in photo-realistic simulations, SLAM benchmarking datasets, and challenging outdoor datasets collected using ground robots.
arXiv Detail & Related papers (2021-06-28T03:56:40Z) - Multi-scale Interaction for Real-time LiDAR Data Segmentation on an
Embedded Platform [62.91011959772665]
Real-time semantic segmentation of LiDAR data is crucial for autonomously driving vehicles.
Current approaches that operate directly on the point cloud use complex spatial aggregation operations.
We propose a projection-based method, called Multi-scale Interaction Network (MINet), which is very efficient and accurate.
arXiv Detail & Related papers (2020-08-20T19:06:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.