Fugu-MT 論文翻訳(概要): A Synergistic CNN-Transformer Network with Pooling Attention Fusion for Hyperspectral Image Classification

論文の概要: A Synergistic CNN-Transformer Network with Pooling Attention Fusion for Hyperspectral Image Classification

arxiv url: http://arxiv.org/abs/2604.23622v1
Date: Sun, 26 Apr 2026 09:30:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.466287
Title: A Synergistic CNN-Transformer Network with Pooling Attention Fusion for Hyperspectral Image Classification
Title（参考訳）: ハイパースペクトル画像分類のためのpooling Attention Fusionを用いたSynergistic CNN-Transformer Network
Authors: Peng Chen, Wenxuan He, Feng Qian, Guangyao Shi, Jingwen Yan,
Abstract要約: ハイパースペクトル画像(HSI)分類では、各画素は特定の土地被覆カテゴリまたは材料に分類される。近年,マルチスケール・ビジョン・トランスフォーマー (ViT) を応用し,スペクトル特性の捕捉と有望な結果の獲得に役立てている。本稿では,HSI分類のためのプール型アテンションフュージョンを用いたシナジスティックCNN-Transformerネットワークを提案する。
参考スコア（独自算出の注目度）: 10.687430702802608
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the hyperspectral image (HSI) classification task, each pixel is categorized into a specific land-cover category or material. Convolutional neural networks (CNNs) and transformers have been widely used to extract local and non-local features in HSI classification. Recent works have utilized a multi-scale vision transformer (ViT) to enhance spectral feature capture and yield promising results. However, most existing methods still face challenges in the effective joint use of spatial-spectral information and in preserving information across layers during the propagation process. To address these issues, we propose a synergistic CNN-Transformer network with pooling attention fusion for HSI classification, which collaboratively utilizes CNNs and ViT to process spatial and spectral features separately. Specifically, we propose a Twin-Branch Feature Extraction (TBFE) module, which employs 3D and 2D convolution in parallel to comprehensively extract spectral and spatial features from HSI. A hybrid pooling attention (HPA) module is designed to aggregate spatial attention. Moreover, a cascade transformer encoder is employed for global spectral feature extraction, and a simple yet efficient cross-layer feature fusion (CFF) module is designed to reduce the loss of crucial information in the previous network layers. Extensive experiments are conducted on several representative datasets to demonstrate the superior performance of our proposed method compared to the state-of-the-art works. Code is available at https://github.com/chenpeng052/SCT-Net.git.
Abstract（参考訳）: ハイパースペクトル画像(HSI)分類タスクでは、各画素を特定の土地被覆カテゴリまたは材料に分類する。畳み込みニューラルネットワーク(CNN)とトランスフォーマーは、HSI分類における局所的特徴と非局所的特徴を抽出するために広く用いられている。近年,マルチスケール・ビジョン・トランスフォーマー (ViT) を応用し,スペクトル特性の捕捉と有望な結果の獲得に役立てている。しかし,既存の手法の多くは,空間スペクトル情報の有効活用と伝搬過程における層間情報保存の課題に直面している。これらの問題に対処するために,CNNとVTを協調的に利用し,空間的特徴とスペクトル的特徴を別々に処理する,HSI分類のための注意融合をプールした相乗的CNN-Transformerネットワークを提案する。具体的には,3次元および2次元の畳み込みを並列に利用し,HSIからスペクトルおよび空間的特徴を包括的に抽出するTBFE (Twin-Branch Feature extract) モジュールを提案する。ハイブリットプールアテンション(HPA)モジュールは、空間的アテンションを集約するように設計されている。さらに、大域的なスペクトル特徴抽出にカスケードトランスフォーマーエンコーダを使用し、従来のネットワーク層における重要な情報の損失を低減するために、単純で効率的なクロス層特徴融合(CFF)モジュールを設計する。提案手法の高性能性を示すために,複数の代表的なデータセットを用いて大規模な実験を行った。コードはhttps://github.com/chenpeng052/SCT-Net.gitで入手できる。

論文の概要: A Synergistic CNN-Transformer Network with Pooling Attention Fusion for Hyperspectral Image Classification

関連論文リスト