Fugu-MT 論文翻訳(概要): FlatVPR: Plug-and-play Geo-linear Residual Adapter for Geometric Rectification of Foundation Model Feature Manifolds

論文の概要: FlatVPR: Plug-and-play Geo-linear Residual Adapter for Geometric Rectification of Foundation Model Feature Manifolds

arxiv url: http://arxiv.org/abs/2606.01734v1
Date: Mon, 01 Jun 2026 05:56:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:31.402961
Title: FlatVPR: Plug-and-play Geo-linear Residual Adapter for Geometric Rectification of Foundation Model Feature Manifolds
Title（参考訳）: FlatVPR:基礎モデル特徴多様体の幾何学的整形のためのプラグアンドプレイジオリニア残差アダプタ
Authors: Rai Hisada, Kanji Tanaka,
Abstract要約: FlatVPR'は、視覚的位置認識における地図軽量性と位置決め精度のトレードオフを橋渡しする。本手法は, 数学的に接地したプルバック平坦性損失を用いて, 多様体曲率を明示的に抑制する。 NCLTデータセットの実験では、アダプタの適用によってパフォーマンスが大幅に向上することを示した。
参考スコア（独自算出の注目度）: 0.7734726150561086
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes ``FlatVPR,'' a novel geometric rectification paradigm that effectively bridges the trade-off between map lightweightness and localization accuracy in visual place recognition (VPR) by enforcing a feature manifold structure where any descriptor between two adjacent anchors $\mathbf{z}_A$ and $\mathbf{z}_B$ can be accurately reconstructed via linear interpolation $\hat{\mathbf{z}}_{pseudo} = (1-t)\mathbf{z}_A + t\mathbf{z}_B$, where $t \in [0,1]$ denotes the relative position. While state-of-the-art foundation models such as DINOv2-ViT-S/14 provide robust semantic features, their latent manifolds exhibit prominent curvature, projecting uniform linear motion in physical space onto highly non-linear trajectories in the feature space, which hinders reliable reconstruction under sparse anchor conditions. To enable the aforementioned interpolation-based reconstruction, we introduce a residual transformation $\hat{\mathbf{z}} = \mathbf{z} + \text{Res}(\mathbf{z})$ to the raw foundation features $\mathbf{z}$, where $\text{Res}(\cdot)$ represents a learnable adapter. Our method explicitly suppresses manifold curvature using a mathematically grounded Pullback Flatness Loss that minimizes the deviation of intermediate features from the linear segment connecting adjacent anchors, thereby minimizing the intrinsic curvature of the manifold. Through this spatial flattening, map construction is formulated within an Expectation-Maximization (EM) framework, decoupled into a continuous M-step for manifold adaptation and a conceptual E-step for optimal anchor selection guidelines. Experiments on the NCLT dataset demonstrate that the application of our adapter leads to significant performance improvements even under extremely sparse anchor conditions with 100m intervals and extreme seasonal changes.
Abstract（参考訳）: 本稿では,2つの隣接するアンカー間の記述子$\mathbf{z}_A$と$\mathbf{z}_B$を線形補間$\hat{\mathbf{z}}_{pseudo} = (1-t)\mathbf{z}_A + t\mathbf{z}_B$, $t \in [0,1]$が相対的な位置を示すような特徴多様体構造を強制することにより,地図の軽量性と視覚的位置認識(VPR)における局所化精度のトレードオフを効果的に橋渡しする新しい幾何的補正パラダイム「FlatVPR」を提案する。 DINOv2-ViT-S/14のような最先端の基盤モデルは堅牢な意味的特徴を提供するが、それらの潜在多様体は顕著な曲率を示し、物理空間の均一な線形運動を特徴空間の高非線形軌道に投影し、スパースアンカー条件下での信頼性の高い再構成を妨げる。上記の補間に基づく再構成を可能にするために、残差変換 $\hat{\mathbf{z}} = \mathbf{z} + \text{Res}(\mathbf{z})$ を原機能 $\mathbf{z}$ に導入する。本手法は,隣接するアンカーを連結する線形セグメントからの中間特徴量の偏差を最小限に抑え,多様体の内在曲率を最小化する,数学的に接地したプルバック平坦度損失を用いた多様体曲率を明示的に抑制する。この空間平坦化を通じて、写像構成は期待最大化(EM)フレームワーク内で定式化され、多様体適応のための連続的なMステップと最適なアンカー選択ガイドラインのための概念的なEステップに分解される。 NCLTデータセットを用いた実験により,100m間隔の極端に狭いアンカー条件下においても,アダプタの適用により大幅な性能向上が得られた。

論文の概要: FlatVPR: Plug-and-play Geo-linear Residual Adapter for Geometric Rectification of Foundation Model Feature Manifolds

関連論文リスト