Fugu-MT 論文翻訳(概要): LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift

論文の概要: LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift

arxiv url: http://arxiv.org/abs/2605.18541v1
Date: Mon, 18 May 2026 15:22:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:49.908396
Title: LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift
Title（参考訳）: LESSViT: スペクトル構成シフト下でのロバストなハイパースペクトル表現学習
Authors: Haozhe Si, Yuxuan Wan, Yuqing Wang, Minh Do, Han Zhao,
Abstract要約: 低ランク空間スペクトル ViT (LESSViT) は、クロススペクトル一般化のためのセンサフレキシブルアーキテクチャである。空間スペクトルマスキングと階層チャネルサンプリングを分離したハイパースペクトルマスク付きオートエンコーダ(HyperMAE)を導入する。
参考スコア（独自算出の注目度）: 29.791943499456426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling hyperspectral imagery (HSI) across different sensors presents a fundamental challenge due to variations in wavelength coverage, band sampling, and channel dimensionality. As a result, models trained under a fixed spectral configuration often fail to generalize to other sensors. Existing Vision Transformer (ViT) approaches either rely on implicit spectral modeling with fixed channel assumptions or adopt explicit spatial-spectral attention with prohibitive computational cost, leading to a fundamental trade-off between efficiency and expressiveness. In this work, we introduce Low-rank Efficient Spatial-Spectral ViT (LESSViT), a sensor-flexible architecture for cross-spectral generalization. LESSViT is built on LESS Attention, a structured low-rank factorization that models joint spatial-spectral interactions through separable spatial and spectral components, reducing the complexity of full spatial-spectral attention from $O(N^2 C^2)$ to $O(rNC)$, where $N$ is the number of spatial tokens, $C$ is the number of spectral channels, and $r$ is the rank of the low-rank approximation. We further incorporate channel-agnostic patch embedding and wavelength-aware positional encoding to support flexible spectral inputs. To enable efficient and robust pretraining, we introduce a hyperspectral masked autoencoder (HyperMAE) with decoupled spatial-spectral masking and hierarchical channel sampling. We evaluate LESSViT under a cross-spectral generalization setting that simulates cross-sensor variability. Experiments on the SpectralEarth benchmark demonstrate that LESSViT improves robustness under spectral shifts while remaining competitive in-distribution, and explicit and efficient spatial-spectral modeling is essential for scalable and generalizable hyperspectral representation learning.
Abstract（参考訳）: 異なるセンサ間でのハイパースペクトル画像(HSI)のモデリングは、波長範囲、バンドサンプリング、チャネル次元のばらつきによる根本的な課題である。その結果、固定スペクトル構成下で訓練されたモデルは、しばしば他のセンサーに一般化できない。既存のビジョントランスフォーマー(ViT)アプローチは、固定チャネル仮定による暗黙のスペクトルモデリングに依存するか、または禁忌な計算コストで明示的な空間スペクトルの注意を取り入れることで、効率性と表現性の間に根本的なトレードオフをもたらす。本研究では, クロススペクトル一般化のためのセンサフレキシブルアーキテクチャであるLESSViT(Lolow-rank Efficient Space-Spectral ViT)を紹介する。 LESSViT は LESS Attention 上に構築されており、これは分離可能な空間スペクトルとスペクトル成分による共同空間スペクトル相互作用をモデル化し、全空間スペクトルの注意の複雑さを$O(N^2 C^2)$から$O(rNC)$に減らし、$N$ は空間トークンの数、$C$ はスペクトルチャネルの数、$r$ は低ランク近似のランクを下げる。さらに、フレキシブルスペクトル入力をサポートするために、チャネル非依存パッチ埋め込みと波長対応位置符号化を組み込む。本研究では,高スペクトルマスク付きオートエンコーダ(HyperMAE)を導入し,空間スペクトルマスキングと階層チャネルサンプリングを分離した。我々は、クロスセンサの変動をシミュレートするクロススペクトル一般化条件下でLESSViTを評価する。 SpectralEarthベンチマークの実験では、LESSViTはスペクトルシフト下でのロバスト性を向上し、競争的分布を維持しながら、明示的で効率的な空間スペクトルモデリングはスケーラブルで一般化可能なハイパースペクトル表現学習に不可欠であることが示された。

論文の概要: LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift

関連論文リスト