Fugu-MT 論文翻訳(概要): Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline

論文の概要: Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline

arxiv url: http://arxiv.org/abs/2604.15652v1
Date: Fri, 17 Apr 2026 02:49:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:19.718459
Title: Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline
Title（参考訳）: リアルなオープンボキャブラリリモートセンシングセグメンテーションに向けて:ベンチマークとベースライン
Authors: Bingyu Li, Tao Huo, Haocheng Dong, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li,
Abstract要約: オープンボキャブラリリモートセンシングイメージセグメンテーション(OVRSIS)は、データセットの断片化、トレーニングの多様性の制限、評価ベンチマークの欠如などにより、まだ探索されていない。我々はOVRSISの大規模かつアプリケーション指向のベンチマークである textitOVRSISBenchV2 を提案する。以上の結果から,リアルなベンチマーク設計の重要性と,OVRSISの摂動型転送の有効性が示唆された。
参考スコア（独自算出の注目度）: 52.65099689153431
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-vocabulary remote sensing image segmentation (OVRSIS) remains underexplored due to fragmented datasets, limited training diversity, and the lack of evaluation benchmarks that reflect realistic geospatial application demands. Our previous \textit{OVRSISBenchV1} established an initial cross-dataset evaluation protocol, but its limited scope is insufficient for assessing realistic open-world generalization. To address this issue, we propose \textit{OVRSISBenchV2}, a large-scale and application-oriented benchmark for OVRSIS. We first construct \textbf{OVRSIS95K}, a balanced dataset of about 95K image--mask pairs covering 35 common semantic categories across diverse remote sensing scenes. Built upon OVRSIS95K and 10 downstream datasets, OVRSISBenchV2 contains 170K images and 128 categories, substantially expanding scene diversity, semantic coverage, and evaluation difficulty. Beyond standard open-vocabulary segmentation, it further includes downstream protocols for building extraction, road extraction, and flood detection, thereby better reflecting realistic geospatial application demands and complex deployment scenarios. We also propose \textbf{Pi-Seg}, a baseline for OVRSIS. Pi-Seg improves transferability through a \textbf{positive-incentive noise} mechanism, where learnable and semantically guided perturbations broaden the visual-text feature space during training. Extensive experiments on OVRSISBenchV1, OVRSISBenchV2, and downstream tasks show that Pi-Seg delivers strong and consistent results, particularly on the more challenging OVRSISBenchV2 benchmark. Our results highlight both the importance of realistic benchmark design and the effectiveness of perturbation-based transfer for OVRSIS. The code and datasets are available at \href{https://github.com/LiBingyu01/RSKT-Seg/tree/Pi-Seg}{LiBingyu01/RSKT-Seg/tree/Pi-Seg}.
Abstract（参考訳）: オープンボキャブラリリモートセンシングイメージセグメンテーション(OVRSIS)は、断片化されたデータセット、トレーニングの多様性の制限、現実的な地理空間的アプリケーション要求を反映した評価ベンチマークの欠如により、いまだ探索されていない。我々の以前の \textit{OVRSISBenchV1} は、最初のクロスデータセット評価プロトコルを確立したが、その限られた範囲は、現実的なオープンワールドの一般化を評価するには不十分である。そこで本研究では,OVRSISの大規模かつアプリケーション指向ベンチマークであるtextit{OVRSISBenchV2}を提案する。まず,多様なリモートセンシングシーンにまたがる35の共通セマンティックカテゴリをカバーする,約95Kイメージのバランスの取れたデータセットである‘textbf{OVRSIS95K} を構築した。 OVRSIS95Kと10の下流データセットに基づいて構築されたOVRSISBenchV2には、170Kの画像と128のカテゴリが含まれており、シーンの多様性、セマンティックカバレッジ、評価の難しさを大幅に拡大している。標準のオープン語彙セグメンテーション以外にも、ビルディング抽出、道路抽出、洪水検出のための下流プロトコルが含まれており、現実的な地理空間アプリケーション要求と複雑なデプロイメントシナリオを反映している。また,OVRSISのベースラインであるtextbf{Pi-Seg}を提案する。 Pi-Segは、学習可能で意味論的に誘導された摂動がトレーニング中に視覚的テキストの特徴空間を広げる、‘textbf{ positive-incentive noise’メカニズムを通じて、転送可能性を改善する。 OVRSISBenchV1、OVRSISBenchV2、下流タスクに関する大規模な実験は、特により困難なOVRSISBenchV2ベンチマークにおいて、Pi-Segが強く一貫性のある結果をもたらすことを示している。本結果は,リアルなベンチマーク設計の重要性と,OVRSISの摂動に基づく転送の有効性の両方を強調した。コードとデータセットは \href{https://github.com/LiBingyu01/RSKT-Seg/tree/Pi-Seg}{LiBingyu01/RSKT-Seg/tree/Pi-Seg} で公開されている。

論文の概要: Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline

関連論文リスト