Dual Selective Fusion Transformer Network for Hyperspectral Image Classification
- URL: http://arxiv.org/abs/2410.03171v3
- Date: Wed, 26 Feb 2025 07:43:03 GMT
- Title: Dual Selective Fusion Transformer Network for Hyperspectral Image Classification
- Authors: Yichu Xu, Di Wang, Lefei Zhang, Liangpei Zhang,
- Abstract summary: Transformer has achieved satisfactory results in the field of hyperspectral image (HSI) classification.<n>Existing Transformer models face two key challenges when dealing with HSI scenes characterized by diverse land cover types and rich spectral information.<n>We propose a novel Dual Selective Fusion Transformer Network (DSFormer) for HSI classification.
- Score: 34.7051033596479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer has achieved satisfactory results in the field of hyperspectral image (HSI) classification. However, existing Transformer models face two key challenges when dealing with HSI scenes characterized by diverse land cover types and rich spectral information: (1) A fixed receptive field overlooks the effective contextual scales required by various HSI objects; (2) invalid self-attention features in context fusion affect model performance. To address these limitations, we propose a novel Dual Selective Fusion Transformer Network (DSFormer) for HSI classification. DSFormer achieves joint spatial and spectral contextual modeling by flexibly selecting and fusing features across different receptive fields, effectively reducing unnecessary information interference by focusing on the most relevant spatial-spectral tokens. Specifically, we design a Kernel Selective Fusion Transformer Block (KSFTB) to learn an optimal receptive field by adaptively fusing spatial and spectral features across different scales, enhancing the model's ability to accurately identify diverse HSI objects. Additionally, we introduce a Token Selective Fusion Transformer Block (TSFTB), which strategically selects and combines essential tokens during the spatial-spectral self-attention fusion process to capture the most crucial contexts. Extensive experiments conducted on four benchmark HSI datasets demonstrate that the proposed DSFormer significantly improves land cover classification accuracy, outperforming existing state-of-the-art methods. Specifically, DSFormer achieves overall accuracies of 96.59%, 97.66%, 95.17%, and 94.59% in the Pavia University, Houston, Indian Pines, and Whu-HongHu datasets, respectively, reflecting improvements of 3.19%, 1.14%, 0.91%, and 2.80% over the previous model. The code will be available online at https://github.com/YichuXu/DSFormer.
Related papers
- DiffFormer: a Differential Spatial-Spectral Transformer for Hyperspectral Image Classification [3.271106943956333]
Hyperspectral image classification (HSIC) has gained significant attention because of its potential in analyzing high-dimensional data with rich spectral and spatial information.
We propose the Differential Spatial-Spectral Transformer (DiffFormer) to address the inherent challenges of HSIC, such as spectral redundancy and spatial discontinuity.
Experiments on benchmark hyperspectral datasets demonstrate the superiority of DiffFormer in terms of classification accuracy, computational efficiency, and generalizability.
arXiv Detail & Related papers (2024-12-23T07:21:41Z) - Boosting Hyperspectral Image Classification with Gate-Shift-Fuse Mechanisms in a Novel CNN-Transformer Approach [8.982950112225264]
This paper introduces an HSI classification model that includes two convolutional blocks, a Gate-Shift-Fuse (GSF) block and a transformer block.
The GSF block is designed to strengthen the extraction of local and global spatial-spectral features.
An effective attention mechanism module is also proposed to enhance the extraction of information from HSI cubes.
arXiv Detail & Related papers (2024-06-20T09:05:50Z) - Superpixel Graph Contrastive Clustering with Semantic-Invariant
Augmentations for Hyperspectral Images [64.72242126879503]
Hyperspectral images (HSI) clustering is an important but challenging task.
We first use 3-D and 2-D hybrid convolutional neural networks to extract the high-order spatial and spectral features of HSI.
We then design a superpixel graph contrastive clustering model to learn discriminative superpixel representations.
arXiv Detail & Related papers (2024-03-04T07:40:55Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Multiview Transformer: Rethinking Spatial Information in Hyperspectral
Image Classification [43.17196501332728]
Identifying the land cover category for each pixel in a hyperspectral image relies on spectral and spatial information.
In this article, we investigate that scene-specific but not essential correlations may be recorded in an HSI cuboid.
We propose a multiview transformer for HSI classification, which consists of multiview principal component analysis (MPCA), spectral encoder-decoder (SED), and spatial-pooling tokenization transformer (SPTT)
arXiv Detail & Related papers (2023-10-11T04:25:24Z) - Unsupervised Hyperspectral and Multispectral Images Fusion Based on the
Cycle Consistency [21.233354336608205]
We propose an unsupervised HSI and MSI fusion model based on the cycle consistency, called CycFusion.
The CycFusion learns the domain transformation between low spatial resolution HSI (LrHSI) and high spatial resolution MSI (HrMSI)
Experiments conducted on several datasets show that our proposed model outperforms all compared unsupervised fusion methods.
arXiv Detail & Related papers (2023-07-07T06:47:15Z) - DCN-T: Dual Context Network with Transformer for Hyperspectral Image
Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.
We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images.
Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z) - ViT-Calibrator: Decision Stream Calibration for Vision Transformer [49.60474757318486]
We propose a new paradigm dubbed Decision Stream that boosts the performance of general Vision Transformers.
We shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions.
arXiv Detail & Related papers (2023-04-10T02:40:24Z) - Spectral Enhanced Rectangle Transformer for Hyperspectral Image
Denoising [64.11157141177208]
We propose a spectral enhanced rectangle Transformer to model the spatial and spectral correlation in hyperspectral images.
For the former, we exploit the rectangle self-attention horizontally and vertically to capture the non-local similarity in the spatial domain.
For the latter, we design a spectral enhancement module that is capable of extracting global underlying low-rank property of spatial-spectral cubes to suppress noise.
arXiv Detail & Related papers (2023-04-03T09:42:13Z) - SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation [6.308766374923878]
We propose the Size-Aware Transformer (SAT) that can tailor effective receptive fields for objects of different sizes.
Our SAT achieves size-aware learning via two steps: introduce multi-scale features to each attention layer and allow each point to choose its attentive fields adaptively.
arXiv Detail & Related papers (2023-01-17T13:25:11Z) - ViTs for SITS: Vision Transformers for Satellite Image Time Series [52.012084080257544]
We introduce a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT)
TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder.
arXiv Detail & Related papers (2023-01-12T11:33:07Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Dynamic Focus-aware Positional Queries for Semantic Segmentation [94.6834904076914]
We propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries.
Our framework achieves SOTA performance and outperforms Mask2former by clear margins of 1.1%, 1.9%, and 1.1% single-scale mIoU with ResNet-50, Swin-T, and Swin-B backbones.
arXiv Detail & Related papers (2022-04-04T05:16:41Z) - Multimodal Fusion Transformer for Remote Sensing Image Classification [35.57881383390397]
Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs)
To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters.
We introduce a new multimodal fusion transformer (MFT) network which comprises a multihead cross patch attention (mCrossPA) for HSI land-cover classification.
arXiv Detail & Related papers (2022-03-31T11:18:41Z) - Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction [138.04956118993934]
We propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST)
CST embedding HSI sparsity into deep learning for HSI reconstruction.
In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing.
arXiv Detail & Related papers (2022-03-09T16:17:47Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - Grafting Transformer on Automatically Designed Convolutional Neural
Network for Hyperspectral Image Classification [7.606096775949237]
Hyperspectral image (HSI) classification has been a hot topic for decides.
Deep learning based HSI classification methods have achieved promising performance.
Several neural architecture search (NAS) algorithms have been proposed for HSI classification.
arXiv Detail & Related papers (2021-10-21T11:51:51Z) - Feature Fusion Vision Transformer for Fine-Grained Visual Categorization [22.91753200323264]
We propose a novel pure transformer-based framework Feature Fusion Vision Transformer (FFVT)
We aggregate the important tokens from each transformer layer to compensate the local, low-level and middle-level information.
We design a novel token selection mod-ule called mutual attention weight selection (MAWS) to guide the network effectively and efficiently towards selecting discriminative tokens.
arXiv Detail & Related papers (2021-07-06T01:48:43Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.