Fugu-MT 論文翻訳(概要): Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM

論文の概要: Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM

arxiv url: http://arxiv.org/abs/2605.02283v1
Date: Mon, 04 May 2026 07:18:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-05 20:33:50.171526
Title: Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM
Title（参考訳）: リモートセンシング検索のための電気光学基礎モデルの再考:一般VFMとの比較
Authors: Hyobin Park, Minseok Seo, Dong-Geol Choi,
Abstract要約: 視覚基盤モデルは、大規模にラベル付けされていない視覚データを活用する能力において大きな注目を集めている。近年の電気光学基盤モデルは、リモートセンシング画像からドメイン固有の表現を学習することを目的としている。検索に基づく評価では、強い一般化的ビジョン基礎モデルよりも効果的かどうかは不明である。
参考スコア（独自算出の注目度）: 7.734759516415116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision foundation models have attracted significant attention for their ability to leverage large-scale unlabeled visual data. This advantage is particularly important in remote sensing, where data acquisition is costly and annotation often requires expert knowledge. Recent electro-optical vision foundation models aim to learn domain-specific representations from remote sensing imagery, but it remains unclear whether they are more effective than strong generalist vision foundation models under retrieval-based evaluation. In this study, we conduct a controlled comparison between representative EO-specific and generalist vision foundation models for remote sensing image retrieval. Using the same datasets, retrieval protocol, and evaluation metric, we evaluate both in-domain performance and cross-scene generalization. Our results show that strong generalist vision foundation models are competitive with, and in some cases outperform, existing EO-specific models. Moreover, EO-specific models often suffer from substantial degradation under cross-scene evaluation, while generalist models show more stable transfer. These findings suggest that EO pretraining alone does not guarantee stronger retrieval-oriented remote sensing representations. We discuss the limitations of current EO-specific pretraining strategies and highlight the need for future EO vision foundation models to better exploit the physical, spatial, spectral, and geographic characteristics of remote sensing imagery.
Abstract（参考訳）: 視覚基盤モデルは、大規模にラベル付けされていない視覚データを活用する能力において大きな注目を集めている。この利点は、データ取得にコストがかかり、しばしば専門家の知識を必要とするリモートセンシングにおいて特に重要である。近年の電気光学的視覚基盤モデルは,リモートセンシング画像からドメイン固有表現を学習することを目的としている。本研究では,リモートセンシング画像検索のための代表的EO固有モデルと汎用的視覚基盤モデルとの制御された比較を行う。同じデータセット、検索プロトコル、評価基準を用いて、ドメイン内のパフォーマンスとクロスシーンの一般化の両方を評価する。以上の結果から,強力な一般化的ビジョン基盤モデルは,既存のEO固有のモデルよりも優れており,性能が優れていることが示唆された。さらに、EO特有のモデルでは、クロスシーン評価の下で大幅に劣化することが多いが、ジェネラリストモデルはより安定な転移を示す。これらの結果から,EO事前学習だけではより強力な検索指向リモートセンシング表現が保証されないことが示唆された。遠隔センシング画像の物理的,空間的,スペクトル的,地理的特性をよりよく活用するために,現在のEO固有の事前訓練戦略の限界について議論し,将来のEOビジョン基盤モデルの必要性を強調した。

論文の概要: Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM

関連論文リスト