Related papers: LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment

LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment

URL: http://arxiv.org/abs/2507.00659v1
Date: Tue, 01 Jul 2025 10:56:51 GMT
Title: LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment
Authors: Juelin Zhu, Shuaibang Peng, Long Wang, Hanlin Tan, Yu Liu, Maojun Zhang, Shen Yan,
Abstract summary: We propose a novel method for aerial visual localization over low Level-of-Detail (LoD) city models.<n>LoD-Loc mainly relies on high-LoD models, but the majority of available models and those many countries plan to construct nationwide are low-LoD (LoD1)<n>We introduce LoD-Loc v2, which employs a coarse-to-fine strategy using explicit silhouette alignment to achieve accurate localization over low-LoD city models in the air.
Score: 16.133812789068806
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel method for aerial visual localization over low Level-of-Detail (LoD) city models. Previous wireframe-alignment-based method LoD-Loc has shown promising localization results leveraging LoD models. However, LoD-Loc mainly relies on high-LoD (LoD3 or LoD2) city models, but the majority of available models and those many countries plan to construct nationwide are low-LoD (LoD1). Consequently, enabling localization on low-LoD city models could unlock drones' potential for global urban localization. To address these issues, we introduce LoD-Loc v2, which employs a coarse-to-fine strategy using explicit silhouette alignment to achieve accurate localization over low-LoD city models in the air. Specifically, given a query image, LoD-Loc v2 first applies a building segmentation network to shape building silhouettes. Then, in the coarse pose selection stage, we construct a pose cost volume by uniformly sampling pose hypotheses around a prior pose to represent the pose probability distribution. Each cost of the volume measures the degree of alignment between the projected and predicted silhouettes. We select the pose with maximum value as the coarse pose. In the fine pose estimation stage, a particle filtering method incorporating a multi-beam tracking approach is used to efficiently explore the hypothesis space and obtain the final pose estimation. To further facilitate research in this field, we release two datasets with LoD1 city models covering 10.7 km , along with real RGB queries and ground-truth pose annotations. Experimental results show that LoD-Loc v2 improves estimation accuracy with high-LoD models and enables localization with low-LoD models for the first time. Moreover, it outperforms state-of-the-art baselines by large margins, even surpassing texture-model-based methods, and broadens the convergence basin to accommodate larger prior errors.

Related papers

Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic Images [0.0]
In Texture2LoD3, we introduce a novel method leveraging the ubiquity of 3D building model priors and panoramic street-level images.<n>We experimentally demonstrate that our method leads to improved facade segmentation accuracy by 11%.<n>We believe that Texture2LoD3 can scale the adoption of LoD3 models, opening applications in estimating building solar potential or enhancing autonomous driving simulations.
arXiv Detail & Related papers (2025-04-07T16:40:16Z)
LoRA vs Full Fine-tuning: An Illusion of Equivalence [76.11938177294178]
We study how Low-Rank Adaptation (LoRA) and full-finetuning change pre-trained models.<n>We find that LoRA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure.<n>We extend the finding that LoRA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension.
arXiv Detail & Related papers (2024-10-28T17:14:01Z)
LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment [16.942854458136633]
We propose a new method for visual localization in complex 3D representations. Unlike existing localization algorithms, we estimate the pose of an Unmanned Vehicle (UAV) using a LevelDetail (LoD) 3D map.
arXiv Detail & Related papers (2024-10-16T06:09:27Z)
Analyzing the impact of semantic LoD3 building models on image-based vehicle localization [0.1398098625978622]
This paper introduces a novel approach for car localization, leveraging image features that correspond with highly detailed semantic 3D building models. The work assesses outcomes using Level of Detail 2 (LoD2) and Level of Detail 3 (LoD3) models, analyzing whether facade-enriched models yield superior accuracy.
arXiv Detail & Related papers (2024-07-31T08:33:41Z)
Model Extrapolation Expedites Alignment [135.12769233630362]
We propose a method called ExPO to expedite alignment training with human preferences.<n>We demonstrate that ExPO boosts a DPO model trained with only 20% steps to outperform the fully-trained one.<n>We show that ExPO notably improves existing open-source LLMs on the leading AlpacaEval 2.0 and MT-Bench benchmarks.
arXiv Detail & Related papers (2024-04-25T17:39:50Z)
MLS2LoD3: Refining low LoDs building models with MLS point clouds to reconstruct semantic LoD3 building models [3.2732273647357446]
We introduce a novel refinement strategy enabling LoD3 reconstruction by leveraging the ubiquity of lower LoD building models and the accuracy of MLS point clouds. We present guidelines for reconstructing LoD3 facade elements and their embedding into the CityGML standard model.
arXiv Detail & Related papers (2024-02-09T09:56:23Z)
Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.<n>Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z)
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning [52.29522018586365]
We study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models. Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains.
arXiv Detail & Related papers (2023-10-10T15:13:30Z)
Anti-Aliased Neural Implicit Surfaces with Encoding Level of Detail [54.03399077258403]
We present LoD-NeuS, an efficient neural representation for high-frequency geometry detail recovery and anti-aliased novel view rendering. Our representation aggregates space features from a multi-convolved featurization within a conical frustum along a ray.
arXiv Detail & Related papers (2023-09-19T05:44:00Z)
Semi-supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation [59.6553058160943]
We propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OpenStreetMap data. The proposed method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters. The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data.
arXiv Detail & Related papers (2023-07-05T18:16:30Z)
Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy. We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z)
LiDAR-aid Inertial Poser: Large-scale Human Motion Capture by Sparse Inertial and LiDAR Sensors [38.60837840737258]
We propose a multi-sensor fusion method for capturing 3D human motions with accurate consecutive local poses and global trajectories in large-scale scenarios. We design a two-stage pose estimator in a coarse-to-fine manner, where point clouds provide the coarse body shape and IMU measurements optimize the local actions. We collect a LiDAR-IMU multi-modal mocap dataset, LIPD, with diverse human actions in long-range scenarios.
arXiv Detail & Related papers (2022-05-30T20:15:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.