Related papers: AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

URL: http://arxiv.org/abs/2404.03187v2
Date: Wed, 09 Oct 2024 03:55:07 GMT
Title: AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales
Authors: Tianrui Guan, Ruiqi Xian, Xijun Wang, Xiyang Wu, Mohamed Elnoor, Daeun Song, Dinesh Manocha,
Abstract summary: We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps. We tackle two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view.
Score: 45.315661330785275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps. AGL-NET tackles two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view. To address these challenges, AGL-NET leverages a unified network architecture with a novel two-stage matching design. The first stage extracts informative neural features directly from raw sensor data and performs initial feature matching. The second stage refines this matching process by extracting informative skeleton features and incorporating a novel scale alignment step to rectify scale variations between LiDAR and map data. Furthermore, a novel scale and skeleton loss function guides the network toward learning scale-invariant feature representations, eliminating the need for pre-processing satellite maps. This significantly improves real-world applicability in scenarios with unknown map scales. To facilitate rigorous performance evaluation, we introduce a meticulously designed dataset within the CARLA simulator specifically tailored for metric localization training and assessment. The code and data can be accessed at https://github.com/rayguan97/AGL-Net.

Related papers

TransLocNet: Cross-Modal Attention for Aerial-Ground Vehicle Localization with Contrastive Learning [14.74396995978237]
TransLocNet is a cross-modal attention framework that fuses LiDAR geometry with aerial semantic context.<n>Experiments on CARLA and KITTI show that TransLocNet outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2025-12-11T08:34:26Z)
Generative MIMO Beam Map Construction for Location Recovery and Beam Tracking [67.65578956523403]
This paper proposes a generative framework to recover location labels directly from sparse channel state information (CSI) measurements.<n>Instead of directly storing raw CSI, we learn a compact low-dimensional radio map embedding and leverage a generative model to reconstruct the high-dimensional CSI.<n> Numerical experiments demonstrate that the proposed model can improve localization accuracy by over 30% and achieve a 20% capacity gain in non-line-of-sight (NLOS) scenarios.
arXiv Detail & Related papers (2025-11-21T07:25:49Z)
Region-Point Joint Representation for Effective Trajectory Similarity Learning [25.664203648334563]
textbfRePo is a novel method that encodes textbfRegion-wise and textbfPoint-wise features to capture both spatial context and fine-grained moving patterns.<n>Experiment results show that RePo achieves an average accuracy improvement of 22.2% over SOTA baselines.
arXiv Detail & Related papers (2025-11-17T08:28:18Z)
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization [66.87005863868181]
We introduce a covisibility graph-based global encoding learning and data augmentation strategy. We revisit the network architecture and local feature extraction module. Our method achieves state-of-the-art on challenging large-scale datasets without relying on network ensembles or 3D supervision.
arXiv Detail & Related papers (2025-01-02T18:59:08Z)
Multi-Level Embedding and Alignment Network with Consistency and Invariance Learning for Cross-View Geo-Localization [2.733505168507872]
Cross-View Geo-Localization (CVGL) involves determining the localization of drone images by retrieving the most similar GPS-tagged satellite images. Existing methods often overlook the problem of increased computational and storage requirements when improving model performance. We propose a lightweight enhanced alignment network, called the Multi-Level Embedding and Alignment Network (MEAN)
arXiv Detail & Related papers (2024-12-19T13:10:38Z)
ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions. Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks. We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z)
Adaptive Local-Component-aware Graph Convolutional Network for One-shot Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition. Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z)
Scale Attention for Learning Deep Face Representation: A Study Against Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory. We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN) As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z)
TC-Net: Triple Context Network for Automated Stroke Lesion Segmentation [0.5482532589225552]
We propose a new network, Triple Context Network (TC-Net), with the capture of spatial contextual information as the core. Our network is evaluated on the open dataset ATLAS, achieving the highest score of 0.594, Hausdorff distance of 27.005 mm, and average symmetry surface distance of 7.137 mm.
arXiv Detail & Related papers (2022-02-28T11:12:16Z)
Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation [93.80710126516405]
We propose a novel lightweight ORSI-SOD solution, named CorrNet, to address these issues. By reducing the parameters and computations of each component, CorrNet ends up having only 4.09M parameters and running with 21.09G FLOPs. Experimental results on two public datasets demonstrate that our lightweight CorrNet achieves competitive or even better performance compared with 26 state-of-the-art methods.
arXiv Detail & Related papers (2022-01-20T08:28:01Z)
DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor Points [15.953570826460869]
Establishing dense correspondence between two images is a fundamental computer vision problem. We introduce DenseGAP, a new solution for efficient Dense correspondence learning with a Graph-structured neural network conditioned on Anchor Points. Our method advances the state-of-the-art of correspondence learning on most benchmarks.
arXiv Detail & Related papers (2021-12-13T18:59:30Z)
Focus on Local: Detecting Lane Marker from Bottom Up via Key Point [10.617793053931964]
We propose a novel lane marker detection solution, FOLOLane, that focuses on modeling local patterns and achieving prediction of global structures. Specifically, the CNN models lowcomplexity local patterns with two separate heads, the first one predicts the existence of key points, and the second refines the location of key points in the local range and correlates key points of the same lane line.
arXiv Detail & Related papers (2021-05-28T08:59:14Z)
Global Context Aware RCNN for Object Detection [1.1939762265857436]
We propose a novel end-to-end trainable framework, called Global Context Aware (GCA) RCNN. The core component of GCA framework is a context aware mechanism, in which both global feature pyramid and attention strategies are used for feature extraction and feature refinement. In the end, we also present a lightweight version of our method, which only slightly increases model complexity and computational burden.
arXiv Detail & Related papers (2020-12-04T14:56:46Z)
PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space. Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.