Fugu-MT 論文翻訳(概要): ConvVitMamba: Efficient Multiscale Convolution, Transformer, and Mamba-Based Sequence modelling for Hyperspectral Image Classification

論文の概要: ConvVitMamba: Efficient Multiscale Convolution, Transformer, and Mamba-Based Sequence modelling for Hyperspectral Image Classification

arxiv url: http://arxiv.org/abs/2604.18856v1
Date: Mon, 20 Apr 2026 21:26:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.505311
Title: ConvVitMamba: Efficient Multiscale Convolution, Transformer, and Mamba-Based Sequence modelling for Hyperspectral Image Classification
Title（参考訳）: ConvVitMamba: ハイパースペクトル画像分類のための効率的なマルチスケール畳み込み、トランスフォーマー、およびマンバに基づくシーケンスモデリング
Authors: Mohammed Q. Alkhatib,
Abstract要約: ハイパスペクトル画像(HSI)分類は、高スペクトル次元、冗長性、ラベル付きデータによって依然として困難である。 ConvVitMambaと呼ばれる統合ハイブリッドフレームワークが、効率的なHSI分類のために提案されている。このアーキテクチャは、3つのコンポーネントを統合する: 局所スペクトル、空間、関節パターンをキャプチャするマルチスケール畳み込み特徴抽出器、グローバルなコンテキスト関係をモデル化するビジョントランスフォーマーベースのトークン化とエンコーディングステージ、効率的なコンテンツ認識のための軽量なMambaインスパイアされたゲート配列混合モジュール。
参考スコア（独自算出の注目度）: 2.538209532048867
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hyperspectral image (HSI) classification remains challenging due to high spectral dimensionality, redundancy, and limited labeled data. Although convolutional neural networks (CNNs) and Vision Transformers (ViTs) achieve strong performance by exploiting spectral-spatial information and long-range dependencies, they often incur high computational cost and large model size, limiting practical use. To address these limitations, a unified hybrid framework, termed ConvVitMamba, is proposed for efficient HSI classification. The architecture integrates three components: a multiscale convolutional feature extractor to capture local spectral, spatial, and joint patterns; a Vision Transformer based tokenization and encoding stage to model global contextual relationships; and a lightweight Mamba inspired gated sequence mixing module for efficient content-aware refinement without quadratic self-attention. Principal Component Analysis (PCA) is used as preprocessing to reduce redundancy and improve efficiency. Experiments on four benchmark datasets, including Houston and three UAV borne QUH datasets (Pingan, Qingyun, and Tangdaowan), demonstrate that ConvVitMamba consistently outperforms CNN, Transformer, and Mamba based methods while maintaining a favorable balance between accuracy, model size, and inference efficiency. Ablation studies confirm the complementary contributions of all components. The results indicate that the proposed framework provides an effective and efficient solution for HSI classification in diverse scenarios. The source code is publicly available at https://github.com/mqalkhatib/ConvVitMamba
Abstract（参考訳）: ハイパスペクトル画像(HSI)分類は、高スペクトル次元、冗長性、ラベル付きデータによって依然として困難である。畳み込みニューラルネットワーク(CNN)とビジョントランスフォーマー(ViT)はスペクトル空間情報と長距離依存を利用して高い性能を達成するが、しばしば高い計算コストと大きなモデルサイズを発生させ、実用的な使用を制限する。これらの制約に対処するため、効率的なHSI分類のために、ConvVitMambaと呼ばれる統合ハイブリッドフレームワークが提案されている。このアーキテクチャは、3つのコンポーネントを統合する: 局所スペクトル、空間、関節パターンをキャプチャするマルチスケール畳み込み特徴抽出器、グローバルなコンテキスト関係をモデル化するビジョントランスフォーマーベースのトークン化およびエンコーディングステージ、および2次自己注意を伴わない効率的なコンテンツ認識改善のための軽量なMambaインスパイアされたゲートシーケンス混合モジュール。主成分分析(PCA)は、冗長性の低減と効率の向上のために前処理として用いられる。ヒューストンとUAV搭載の3つのQUHデータセット(Pingan、Qingyun、Tangdaowan)を含む4つのベンチマークデータセットの実験では、ConvVitMambaはCNN、Transformer、Mambaベースの手法より一貫して優れており、精度、モデルサイズ、推論効率のバランスが良好である。アブレーション研究は全ての成分の相補的な寄与を裏付ける。その結果,提案フレームワークは多様なシナリオにおけるHSI分類を効果的かつ効率的に行うことができることがわかった。ソースコードはhttps://github.com/mqalkhatib/ConvVitMambaで公開されている。

論文の概要: ConvVitMamba: Efficient Multiscale Convolution, Transformer, and Mamba-Based Sequence modelling for Hyperspectral Image Classification

関連論文リスト