Fugu-MT 論文翻訳(概要): SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition

論文の概要: SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition

arxiv url: http://arxiv.org/abs/2509.25723v1
Date: Tue, 30 Sep 2025 03:34:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:45:00.002074
Title: SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition
Title（参考訳）: SAGE:視覚的位置認識のための空間視覚適応グラフ探索
Authors: Shunpeng Chen, Changwei Wang, Rongtao Xu, Xingtian Pei, Yukun Song, Jinzhou Lin, Wenhao Xu, Jingyi Zhang, Li Guo, Shibiao Xu,
Abstract要約: 視覚的位置認識(VPR)は、外観、視点、環境の変化にもかかわらず、ジオタグ付き画像の堅牢な検索を必要とする。 SAGE(Spatial-visual Adaptive Graph Exploration)は,空間的・視覚的識別の微粒化を促進する統一的な学習パイプラインである。
参考スコア（独自算出の注目度）: 37.553281487983064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Place Recognition (VPR) requires robust retrieval of geotagged images despite large appearance, viewpoint, and environmental variation. Prior methods focus on descriptor fine-tuning or fixed sampling strategies yet neglect the dynamic interplay between spatial context and visual similarity during training. We present SAGE (Spatial-visual Adaptive Graph Exploration), a unified training pipeline that enhances granular spatial-visual discrimination by jointly improving local feature aggregation, organize samples during training, and hard sample mining. We introduce a lightweight Soft Probing module that learns residual weights from training data for patch descriptors before bilinear aggregation, boosting distinctive local cues. During training we reconstruct an online geo-visual graph that fuses geographic proximity and current visual similarity so that candidate neighborhoods reflect the evolving embedding landscape. To concentrate learning on the most informative place neighborhoods, we seed clusters from high-affinity anchors and iteratively expand them with a greedy weighted clique expansion sampler. Implemented with a frozen DINOv2 backbone and parameter-efficient fine-tuning, SAGE achieves SOTA across eight benchmarks. It attains 98.9%, 95.8%, 94.5%, and 96.0% Recall@1 on SPED, Pitts30k-test, MSLS-val, and Nordland, respectively. Notably, our method obtains 100% Recall@10 on SPED only using 4096D global descriptors. Code and model will be available at: https://github.com/chenshunpeng/SAGE.
Abstract（参考訳）: 視覚的位置認識(VPR)は、外観、視点、環境の変化にもかかわらず、ジオタグ付き画像の堅牢な検索を必要とする。以前の手法では、ディスクリプタの微調整や固定サンプリング戦略に重点を置いていたが、トレーニング中に空間コンテキストと視覚的類似性の間の動的相互作用は無視されていた。 SAGE(Spatial-visual Adaptive Graph Exploration)は,局所的な特徴集約を共同で改善し,トレーニング中のサンプルを整理し,硬いサンプルマイニングにより,空間的空間的識別を高める統一的な訓練パイプラインである。我々は,バイリニアアグリゲーションの前に,パッチ記述子のトレーニングデータから残重量を学習し,特異な局所的手がかりを高める軽量なSoft Probingモジュールを提案する。トレーニング中、我々は、近距離と現在の視覚的類似性を融合したオンラインジオビジュアルグラフを再構築し、候補地区が進化する埋め込み風景を反映するようにした。高親和性アンカーからクラスターを抽出し, 重み付き斜め膨張サンプリング器で反復的に拡張する。凍結したDINOv2バックボーンとパラメータ効率の良い微調整により、SAGEは8つのベンチマークでSOTAを達成する。 98.9%、95.8%、94.5%、96.0%のRecall@1がSPED、Pitts30k-test、MSLS-val、Norlandに到達している。特に,4096Dグローバルディスクリプタのみを用いてSPED上のRecall@10を100%取得する。コードとモデルについては、https://github.com/chenshunpeng/SAGE.comで公開される。

論文の概要: SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition

関連論文リスト