Fugu-MT 論文翻訳(概要): VIRD: View-Invariant Representation through Dual-Axis Transformation for Cross-View Pose Estimation

論文の概要: VIRD: View-Invariant Representation through Dual-Axis Transformation for Cross-View Pose Estimation

arxiv url: http://arxiv.org/abs/2603.12918v1
Date: Fri, 13 Mar 2026 11:48:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:12.068811
Title: VIRD: View-Invariant Representation through Dual-Axis Transformation for Cross-View Pose Estimation
Title（参考訳）: VIRD: 両軸変換によるビュー不変表現によるクロスビューポーズ推定
Authors: Juhye Park, Wooju Lee, Dasol Hong, Changki Sung, Youngwoo Seo, Dongwan Kang, Hyun Myung,
Abstract要約: クロスビューポーズ推定は、ジオレファレンス衛星画像に対する地上画像に対応する3DFカメラポーズを予測する。両軸変換によるビュー不変表現を構成する新しいクロスビューポーズ推定法を提案する。 KITTIデータセットとVIGORデータセットの実験では、VIRDは方向の先行のない最先端の手法よりも優れていることが示されている。
参考スコア（独自算出の注目度）: 12.845645384371876
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate global localization is crucial for autonomous driving and robotics, but GNSS-based approaches often degrade due to occlusion and multipath effects. As an emerging alternative, cross-view pose estimation predicts the 3-DoF camera pose corresponding to a ground-view image with respect to a geo-referenced satellite image. However, existing methods struggle to bridge the significant viewpoint gap between the ground and satellite views mainly due to limited spatial correspondences. We propose a novel cross-view pose estimation method that constructs view-invariant representations through dual-axis transformation (VIRD). VIRD first applies a polar transformation to the satellite view to establish horizontal correspondence, then uses context-enhanced positional attention on the ground and polar-transformed satellite features to resolve vertical misalignment, explicitly mitigating the viewpoint gap. A view-reconstruction loss is introduced to strengthen the view invariance further, encouraging the derived representations to reconstruct the original and cross-view images. Experiments on the KITTI and VIGOR datasets demonstrate that VIRD outperforms the state-of-the-art methods without orientation priors, reducing median position and orientation errors by 50.7% and 76.5% on KITTI, and 18.0% and 46.8% on VIGOR, respectively.
Abstract（参考訳）: 正確なグローバルなローカライゼーションは自律走行とロボティクスにとって重要であるが、GNSSベースのアプローチは、閉塞とマルチパス効果のためにしばしば低下する。新たな選択肢として、ジオレファレンス衛星画像に対する地上画像に対応する3DFカメラポーズを、クロスビューポーズ推定により予測する。しかし、既存の手法は、主に空間的な通信が限られているため、地上と衛星の視界のかなりのギャップを埋めることに苦慮している。本稿では、ビュー不変表現を2軸変換(VIRD)により構築する、新しいクロスビューポーズ推定手法を提案する。 VIRDは、まず、水平対応を確立するために衛星の視界に極性変換を適用し、次いで、地上と極性変換された衛星の特徴に文脈的な注意を向け、垂直方向の不整合を解消し、視点ギャップを明示的に緩和する。ビュー不変性をさらに強化するために、ビュー再構成損失を導入し、導出表現が元のビューイメージとクロスビューイメージを再構成することを奨励する。 KITTI と VIGOR データセットの実験では、VIRD は方向の先行しない最先端の手法より優れており、中央値の位置と方向の誤差は KITTI では 50.7% と 76.5% 、VIGOR では 18.0% と 46.8% である。

論文の概要: VIRD: View-Invariant Representation through Dual-Axis Transformation for Cross-View Pose Estimation

関連論文リスト