Fugu-MT 論文翻訳(概要): Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning

論文の概要: Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning

arxiv url: http://arxiv.org/abs/2510.24321v1
Date: Tue, 28 Oct 2025 11:39:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-29 15:35:37.095835
Title: Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning
Title（参考訳）: CLIP と Prompt Learning を用いたFew-Shot Remote Sensing Image Scene Classification
Authors: Ivica Dimitrovski, Vlatko Spasev, Ivan Kitanovski,
Abstract要約: 我々は,数ショットのリモートセンシング画像シーン分類のための軽量かつ効率的な適応戦略として,即時学習を探求する。これらのプロンプト学習手法を,手作りプロンプトを用いたゼロショットCLIPと,凍結したCLIPの特徴を訓練した線形プローブの2つの標準ベースラインに対してベンチマークした。我々の研究結果は、衛星画像と空中画像の領域ギャップを埋めるスケーラブルで効率的な方法として、迅速な学習を裏付けている。
参考スコア（独自算出の注目度）: 0.9558392439655014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Remote sensing applications increasingly rely on deep learning for scene classification. However, their performance is often constrained by the scarcity of labeled data and the high cost of annotation across diverse geographic and sensor domains. While recent vision-language models like CLIP have shown promise by learning transferable representations at scale by aligning visual and textual modalities, their direct application to remote sensing remains suboptimal due to significant domain gaps and the need for task-specific semantic adaptation. To address this critical challenge, we systematically explore prompt learning as a lightweight and efficient adaptation strategy for few-shot remote sensing image scene classification. We evaluate several representative methods, including Context Optimization, Conditional Context Optimization, Multi-modal Prompt Learning, and Prompting with Self-Regulating Constraints. These approaches reflect complementary design philosophies: from static context optimization to conditional prompts for enhanced generalization, multi-modal prompts for joint vision-language adaptation, and semantically regularized prompts for stable learning without forgetting. We benchmark these prompt-learning methods against two standard baselines: zero-shot CLIP with hand-crafted prompts and a linear probe trained on frozen CLIP features. Through extensive experiments on multiple benchmark remote sensing datasets, including cross-dataset generalization tests, we demonstrate that prompt learning consistently outperforms both baselines in few-shot scenarios. Notably, Prompting with Self-Regulating Constraints achieves the most robust cross-domain performance. Our findings underscore prompt learning as a scalable and efficient solution for bridging the domain gap in satellite and aerial imagery, providing a strong foundation for future research in this field.
Abstract（参考訳）: リモートセンシングアプリケーションは、シーン分類の深層学習にますます依存している。しかし、ラベル付きデータの不足や、さまざまな地理的およびセンサー領域にわたるアノテーションの高コストにより、その性能は制約されることが多い。最近のCLIPのようなビジョン言語モデルは、視覚的およびテキスト的モダリティを整列させることで、大規模に転送可能な表現を学習することで、将来性を示しているが、リモートセンシングへの直接的な適用は、大きなドメインギャップとタスク固有のセマンティック適応の必要性により、依然として最適ではない。この重要な課題に対処するために,数発のリモートセンシング画像シーン分類のための軽量かつ効率的な適応戦略として,プロンプトラーニングを体系的に検討する。本研究では,コンテキスト最適化,条件付きコンテキスト最適化,マルチモーダル・プロンプト学習,自己規制制約付きプロンプトなど,いくつかの代表的な手法を評価する。これらのアプローチは、静的な文脈最適化から、一般化のための条件付きプロンプト、共同視覚言語適応のためのマルチモーダルプロンプト、忘れずに安定した学習のためのセマンティックな正規化プロンプトなど、相補的な設計哲学を反映している。これらのプロンプト学習手法を,手作りプロンプトを用いたゼロショットCLIPと,凍結したCLIPの特徴を訓練した線形プローブの2つの標準ベースラインに対してベンチマークした。クロスデータセットの一般化テストを含む、複数のベンチマークリモートセンシングデータセットに関する広範な実験を通じて、素早い学習が数ショットシナリオで両方のベースラインを一貫して上回ることを示す。特に、Prompting with Self-Regulating Constraintsは、最も堅牢なクロスドメインパフォーマンスを実現する。我々の研究成果は、衛星画像と空中画像の領域ギャップを埋めるスケーラブルで効率的な方法として、迅速な学習を立証している。

論文の概要: Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning

関連論文リスト