Fugu-MT 論文翻訳(概要): SIGMAE: A Spectral-Index-Guided Foundation Model for Multispectral Remote Sensing

論文の概要: SIGMAE: A Spectral-Index-Guided Foundation Model for Multispectral Remote Sensing

arxiv url: http://arxiv.org/abs/2603.07463v1
Date: Sun, 08 Mar 2026 04:55:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.610121
Title: SIGMAE: A Spectral-Index-Guided Foundation Model for Multispectral Remote Sensing
Title（参考訳）: SIGMAE:マルチスペクトルリモートセンシングのためのスペクトルインデックス誘導基礎モデル
Authors: Xiaokang Zhang, Bo Li, Chufeng Zhou, Weikang Yu, Lefei Zhang,
Abstract要約: Masked Autoencoder (MAE) ベースの事前訓練は、マスク付き画像領域を再構成することで、一般的な特徴表現を学習する強力な能力である。多スペクトルリモートセンシング画像へのMAEの適用は、複雑な背景、不明瞭なターゲット、マスキング中の意味的ガイダンスの欠如など、依然として困難である。 SIGMAEは、ドメイン固有のスペクトル指標を事前知識として組み込んで、動的トークンマスキングを情報領域へ導く。
参考スコア（独自算出の注目度）: 43.39478017496301
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pretraining and fine-tuning have emerged as a new paradigm in remote sensing image interpretation. Among them, Masked Autoencoder (MAE)-based pretraining stands out for its strong capability to learn general feature representations via reconstructing masked image regions. However, applying MAE to multispectral remote sensing images remains challenging due to complex backgrounds, indistinct targets, and the lack of semantic guidance during masking, which hinders the learning of underlying structures and meaningful spatial-spectral features. To address this, we propose a simple yet effective approach, Spectral Index-Guided MAE (SIGMAE), for multispectral image pretraining. The core idea is to incorporate domain-specific spectral indices as prior knowledge to guide dynamic token masking toward informative regions. SIGMAE introduces Semantic Saliency-Guided Dynamic Token Masking (SSDTM), a curriculum-style strategy that quantifies each patch's semantic richness and internal heterogeneity to adaptively select the most informative tokens during training. By prioritizing semantically salient regions and progressively increasing sample difficulty, SSDTM enhances spectrally rich and structurally aware representation learning, mitigates overfitting, and reduces redundant computation compared with random masking. Extensive experiments on five widely used datasets covering various downstream tasks, including scene classification, semantic segmentation, object extraction and change detection, demonstrate that SIGMAE outperforms other pretrained geospatial foundation models. Moreover, it exhibits strong spatial-spectral reconstruction capability, even with a 90% mask ratio, and improves complex target recognition under limited labeled data. The source codes and model weights will be released at https://github.com/zxk688/SIGMAE.
Abstract（参考訳）: リモートセンシング画像解釈における新たなパラダイムとして,事前学習と微調整が登場している。中でもMasked Autoencoder(MAE)を用いた事前学習は,マスク付き画像領域を再構成することで,一般的な特徴表現を学習する能力に優れていた。しかし,マルチスペクトルリモートセンシング画像へのMAEの適用は,複雑な背景,不明瞭な目標,マスキングにおける意味的指導の欠如など,基礎構造や意味のある空間スペクトルの特徴の学習を妨げるため,依然として困難である。そこで本研究では,マルチスペクトル画像事前学習のための簡易かつ効果的なスペクトル指数誘導MAE(SIGMAE)を提案する。ドメイン固有のスペクトル指標を事前知識として組み込んで、動的トークンマスキングを情報領域へ導くという考え方である。 SIGMAEはSemantic Saliency-Guided Dynamic Token Masking (SSDTM)を導入した。これは、各パッチのセマンティックリッチ性と内部の不均一性を定量化し、トレーニング中に最も情報性の高いトークンを適応的に選択するカリキュラムスタイルの戦略である。意味的に健全な領域の優先順位付けとサンプルの難易度の向上により、SSDTMはスペクトル的にリッチで構造的に認識された表現学習を強化し、オーバーフィッティングを緩和し、ランダムマスキングと比較して冗長な計算を減らす。シーン分類、セマンティックセグメンテーション、オブジェクト抽出、変更検出など、下流のタスクをカバーする5つの広く使われているデータセットに対する大規模な実験は、SIGMAEが他の事前訓練された地理空間基盤モデルより優れていることを実証している。さらに、90%のマスク比でも強い空間スペクトル再構成能力を示し、ラベル付きデータによる複雑な目標認識を改善する。ソースコードとモデルの重み付けはhttps://github.com/zxk688/SIGMAEで公開される。

論文の概要: SIGMAE: A Spectral-Index-Guided Foundation Model for Multispectral Remote Sensing

関連論文リスト