Fugu-MT 論文翻訳(概要): Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions

論文の概要: Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions

arxiv url: http://arxiv.org/abs/2510.22760v1
Date: Sun, 26 Oct 2025 17:18:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 19:54:32.555934
Title: Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions
Title（参考訳）: 言えないことを理解する:スカース表現を用いたリモートセンシング画像のセグメンテーション
Authors: Kai Ye, Bowen Liu, Jianghang Lin, Jiayi Ji, Pingyang Dai, Liujuan Cao,
Abstract要約: Referring Remote Sensing Imageは、リモートセンシングイメージのインスタンスを参照式に従ってセグメントすることを目的としている。本稿では,RRSISのためのWREL(Weakly Referring Expression Learning)という新たな学習パラダイムを提案する。混合参照学習は、完全注釈付き参照表現を用いたトレーニングと比較して、性能ギャップに証明可能な上限をもたらすことを示す。
参考スコア（独自算出の注目度）: 45.04317112354794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Referring Remote Sensing Image Segmentation (RRSIS) aims to segment instances in remote sensing images according to referring expressions. Unlike Referring Image Segmentation on general images, acquiring high-quality referring expressions in the remote sensing domain is particularly challenging due to the prevalence of small, densely distributed objects and complex backgrounds. This paper introduces a new learning paradigm, Weakly Referring Expression Learning (WREL) for RRSIS, which leverages abundant class names as weakly referring expressions together with a small set of accurate ones to enable efficient training under limited annotation conditions. Furthermore, we provide a theoretical analysis showing that mixed-referring training yields a provable upper bound on the performance gap relative to training with fully annotated referring expressions, thereby establishing the validity of this new setting. We also propose LRB-WREL, which integrates a Learnable Reference Bank (LRB) to refine weakly referring expressions through sample-specific prompt embeddings that enrich coarse class-name inputs. Combined with a teacher-student optimization framework using dynamically scheduled EMA updates, LRB-WREL stabilizes training and enhances cross-modal generalization under noisy weakly referring supervision. Extensive experiments on our newly constructed benchmark with varying weakly referring data ratios validate both the theoretical insights and the practical effectiveness of WREL and LRB-WREL, demonstrating that they can approach or even surpass models trained with fully annotated referring expressions.
Abstract（参考訳）: Referring Remote Sensing Image Segmentation (RRSIS)は、リモートセンシング画像中のインスタンスを参照式に従ってセグメントすることを目的としている。 Referring Image Segmentation on general imageとは異なり、リモートセンシング領域における高品質な参照表現の取得は、小型で高密度な分散オブジェクトや複雑な背景を持つため、特に困難である。本稿では,限られたアノテーション条件下での効率的な学習を可能にするために,少数の正確な表現セットとともに,豊富なクラス名を弱い参照表現として活用する新しい学習パラダイムであるWREL(Weakly Referring Expression Learning)を提案する。さらに、混合参照学習が、完全注釈付き参照表現によるトレーニングと比較して、性能ギャップの証明可能な上限を得られることを示す理論解析を行い、この新たな設定の有効性を確立した。また,Learnerable Reference Bank (LRB) を統合した LRB-WREL を提案する。動的にスケジュールされたEMA更新を用いた教師学生最適化フレームワークと組み合わせることで、RBB-WRELはトレーニングを安定させ、ノイズの多い監督下でのクロスモーダル一般化を強化する。 WREL と LRB-WREL の理論的洞察と実用的有効性の両方を検証し、完全に注釈付き参照表現で訓練されたモデルにアプローチしたり、超えたりできることを示した。

論文の概要: Understanding What Is Not Said:Referring Remote Sensing Image Segmentation with Scarce Expressions

関連論文リスト