Fugu-MT 論文翻訳(概要): Unifying UAV Cross-View Geo-Localization via 3D Geometric Perception

論文の概要: Unifying UAV Cross-View Geo-Localization via 3D Geometric Perception

arxiv url: http://arxiv.org/abs/2604.01747v1
Date: Thu, 02 Apr 2026 08:08:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.605193
Title: Unifying UAV Cross-View Geo-Localization via 3D Geometric Perception
Title（参考訳）: 3次元幾何学的知覚によるUAVクロスビュージオローカライゼーションの統一
Authors: Haoyuan Li, Wen Yang, Fang Xu, Hong Tan, Haijian Zhang, Shengyang Li, Gui-Song Xia,
Abstract要約: 無人航空機(UAV)のクロスビューな地上局地化は、斜めのUAV画像と衛星地図との厳密な幾何学的相違により、いまだに困難である。本稿では,3次元シーン形状を明示的にモデル化し,粗い位置認識ときめ細かなポーズ推定を統一する,幾何認識型UAV測位フレームワークを提案する。提案手法は, 最先端のベースラインを著しく上回り, ロバストメータレベルのローカライゼーション精度を実現し, 複雑な都市環境における一般化を向上する。
参考スコア（独自算出の注目度）: 51.687842983240564
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-view geo-localization for Unmanned Aerial Vehicles (UAVs) operating in GNSS-denied environments remains challenging due to the severe geometric discrepancy between oblique UAV imagery and orthogonal satellite maps. Most existing methods address this problem through a decoupled pipeline of place retrieval and pose estimation, implicitly treating perspective distortion as appearance noise rather than an explicit geometric transformation. In this work, we propose a geometry-aware UAV geo-localization framework that explicitly models the 3D scene geometry to unify coarse place recognition and fine-grained pose estimation within a single inference pipeline. Our approach reconstructs a local 3D scene from multi-view UAV image sequences using a Visual Geometry Grounded Transformer (VGGT), and renders a virtual Bird's-Eye View (BEV) representation that orthorectifies the UAV perspective to align with satellite imagery. This BEV serves as a geometric intermediary that enables robust cross-view retrieval and provides spatial priors for accurate 3 Degrees of Freedom (3-DoF) pose regression. To efficiently handle multiple location hypotheses, we introduce a Satellite-wise Attention Block that isolates the interaction between each satellite candidate and the reconstructed UAV scene, preventing inter-candidate interference while maintaining linear computational complexity. In addition, we release a recalibrated version of the University-1652 dataset with precise coordinate annotations and spatial overlap analysis, enabling rigorous evaluation of end-to-end localization accuracy. Extensive experiments on the refined University-1652 benchmark and SUES-200 demonstrate that our method significantly outperforms state-of-the-art baselines, achieving robust meter-level localization accuracy and improved generalization in complex urban environments.
Abstract（参考訳）: 斜めUAV画像と直交衛星地図との厳密な幾何学的相違により, GNSS の環境下での無人航空機(UAV)のクロスビュージオローカライゼーションはいまだに困難である。既存の手法の多くは、位置探索とポーズ推定の分離されたパイプラインを通じてこの問題に対処し、視線歪みを明示的な幾何学的変換ではなく外観雑音として暗黙的に扱う。本研究では,3次元シーン形状を明示的にモデル化し,粗い位置認識と細粒度ポーズ推定を単一推論パイプライン内で統一する,幾何学的認識型UAV測位フレームワークを提案する。提案手法は,VGGT (Visual Geometry Grounded Transformer) を用いて,多視点UAV画像シーケンスから局所的な3Dシーンを再構成し,UAV視点を補正して衛星画像と整合する仮想Bird's-Eye View (BEV) 表現を描画する。このBEVは、堅牢なクロスビュー検索を可能にする幾何学的仲介者として機能し、正確な3次元自由度(3-DoF)ポーズ回帰のための空間的事前情報を提供する。複数の位置仮説を効率的に扱うために,各衛星候補と再構成されたUAVシーンとの相互作用を分離し,線形計算複雑性を維持しながら候補間干渉を防止する衛星ワイド・アテンション・ブロックを導入する。さらに,正確な座標アノテーションと空間重なり解析を備えたUniversity-1652データセットの校正版をリリースし,エンドツーエンドのローカライゼーション精度の厳密な評価を可能にした。改良されたUniversity-1652ベンチマークとSUES-200の広範囲な実験により、我々の手法は最先端のベースラインを著しく上回り、ロバストなメートルレベルのローカライゼーション精度を実現し、複雑な都市環境における一般化を改善した。

論文の概要: Unifying UAV Cross-View Geo-Localization via 3D Geometric Perception

関連論文リスト