Fugu-MT 論文翻訳(概要): Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing

論文の概要: Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing

arxiv url: http://arxiv.org/abs/2509.12040v1
Date: Mon, 15 Sep 2025 15:24:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:23.365275
Title: Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing
Title（参考訳）: リモートセンシングにおける効率的な開語彙セグメンテーションの探索
Authors: Bingyu Li, Haocheng Dong, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li,
Abstract要約: Open-Vocabulary Remote Sensing Image (OVRSIS)は、OVS(Open-Vocabulary)をリモートセンシング(RS)ドメインに適応させる新しいタスクである。 textbfRSKT-Segは、リモートセンシングに適した新しいオープン語彙セグメンテーションフレームワークである。 RSKT-Segは高いOVSベースラインを+3.8 mIoUと+5.9 mACCで上回り、効率的なアグリゲーションによって2倍高速な推論を実現している。
参考スコア（独自算出の注目度）: 55.291219073365546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-Vocabulary Remote Sensing Image Segmentation (OVRSIS), an emerging task that adapts Open-Vocabulary Segmentation (OVS) to the remote sensing (RS) domain, remains underexplored due to the absence of a unified evaluation benchmark and the domain gap between natural and RS images. To bridge these gaps, we first establish a standardized OVRSIS benchmark (\textbf{OVRSISBench}) based on widely-used RS segmentation datasets, enabling consistent evaluation across methods. Using this benchmark, we comprehensively evaluate several representative OVS/OVRSIS models and reveal their limitations when directly applied to remote sensing scenarios. Building on these insights, we propose \textbf{RSKT-Seg}, a novel open-vocabulary segmentation framework tailored for remote sensing. RSKT-Seg integrates three key components: (1) a Multi-Directional Cost Map Aggregation (RS-CMA) module that captures rotation-invariant visual cues by computing vision-language cosine similarities across multiple directions; (2) an Efficient Cost Map Fusion (RS-Fusion) transformer, which jointly models spatial and semantic dependencies with a lightweight dimensionality reduction strategy; and (3) a Remote Sensing Knowledge Transfer (RS-Transfer) module that injects pre-trained knowledge and facilitates domain adaptation via enhanced upsampling. Extensive experiments on the benchmark show that RSKT-Seg consistently outperforms strong OVS baselines by +3.8 mIoU and +5.9 mACC, while achieving 2x faster inference through efficient aggregation. Our code is \href{https://github.com/LiBingyu01/RSKT-Seg}{\textcolor{blue}{here}}.
Abstract（参考訳）: Open-Vocabulary Remote Sensing Image Segmentation (OVRSIS)は、OVS(Open-Vocabulary Segmentation)をリモートセンシング(RS)ドメインに適合させる新しいタスクである。これらのギャップを埋めるために、我々はまず広く使われているRSセグメンテーションデータセットに基づいて標準化されたOVRSISベンチマーク(\textbf{OVRSISBench})を構築し、メソッド間で一貫した評価を可能にする。本ベンチマークを用いて,複数のOVS/OVRSISモデルを総合的に評価し,リモートセンシングのシナリオに直接適用した場合の限界を明らかにする。これらの知見に基づいて,リモートセンシングに適したオープン語彙セグメンテーションフレームワークであるtextbf{RSKT-Seg}を提案する。 RSKT-Segは、3つの重要なコンポーネントを統合している。(1)視覚と言語によるコサインの類似性を複数の方向で計算することで回転不変の視覚的キューをキャプチャする多方向コストマップアグリゲーション(RS-CMA)モジュール、(2)空間的および意味的依存関係を軽量な次元減少戦略でモデル化する効率的なコストマップ融合(RS-Fusion)トランスフォーマー、(3)事前学習された知識を注入し、拡張されたアップスタンピングによるドメイン適応を容易にするリモートセンシングナレッジトランスフォーマー(RS-Transfer)モジュール。ベンチマーク実験の結果、RSKT-Segは高いOVSベースラインを+3.8 mIoUと+5.9 mACCで上回り、効率的なアグリゲーションによって2倍高速な推論を達成している。我々のコードは \href{https://github.com/LiBingyu01/RSKT-Seg}{\textcolor{blue}{here}} です。

論文の概要: Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing

関連論文リスト