Fugu-MT 論文翻訳(概要): Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

論文の概要: Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

arxiv url: http://arxiv.org/abs/2604.00820v1
Date: Wed, 01 Apr 2026 12:27:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.984488
Title: Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis
Title（参考訳）: リモートセンシングのための連続視覚言語学習:ベンチマークと分析
Authors: Xingxing Weng, Ruifeng Ni, Chao Pang, XiangYu Hao, Yishan Wang, Xiaokang Zhang, Wei Xu, Gui-Song Xia,
Abstract要約: リモートセンシングにおける連続的な視覚言語学習のためのベンチマークであるCLeaRSを提案する。 CLeaRSは、207k以上のイメージテキストペアを持つ10のキュレートされたサブセットで構成され、多様な解釈タスク、モダリティの検知、アプリケーションシナリオで構成されている。多様な視覚言語モデルの大規模なベンチマークでは、すべての設定において破滅的な忘れが浮かび上がっている。
参考スコア（独自算出の注目度）: 39.81956241706565
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually adapt without catastrophic forgetting. Despite its practical importance, the continual learning capability of RS VLMs remains underexplored, and no dedicated benchmark currently exists. In this work, we present CLeaRS, a comprehensive benchmark for continual vision-language learning in remote sensing. CLeaRS comprises 10 curated subsets with over 207k image-text pairs, spanning diverse interpretation tasks, sensing modalities, and application scenarios. We further define three evaluation protocols: long-horizon, modality-incremental, and task-incremental settings, to systematically assess continual adaptation. Extensive benchmarking of diverse vision-language models reveals catastrophic forgetting across all settings. Moreover, representative continual learning methods, when adapted to RS VLMs, exhibit limited effectiveness in handling task, instruction, and modality transitions. Our findings underscore the need for developing continual learning methods tailored to RS VLMs.
Abstract（参考訳）: 現在のリモートセンシングビジョン言語モデル(RS VLM)は、画像解釈において印象的な性能を示すが、静的なトレーニングデータに依存しており、連続的に出現する知覚モダリティや下流タスクに対応する能力を制限する。これは、RS VLMが破滅的な忘れをせずに継続的に適応できるようにするという根本的な課題を露呈する。その実用的重要性にもかかわらず、RS VLMの継続的な学習能力は未定であり、まだ専用のベンチマークは存在していない。本研究では,リモートセンシングにおける連続的な視覚言語学習のための総合的なベンチマークであるCLeaRSを提案する。 CLeaRSは、207k以上のイメージテキストペアを持つ10のキュレートされたサブセットで構成され、多様な解釈タスク、モダリティの検知、アプリケーションシナリオで構成されている。さらに,連続的な適応を体系的に評価するための3つの評価プロトコル,長期化,モーダル化,タスク増分の設定を定義した。多様な視覚言語モデルの大規模なベンチマークでは、すべての設定において破滅的な忘れが浮かび上がっている。さらに、RS VLMに適応した代表的連続学習手法は、タスク、命令、モダリティ遷移の処理において限定的な効果を示す。この結果から,RS VLMに適した連続学習手法の必要性が示唆された。

論文の概要: Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

関連論文リスト