Deep Hyperspectral Unmixing using Transformer Network
- URL: http://arxiv.org/abs/2203.17076v1
- Date: Thu, 31 Mar 2022 14:47:36 GMT
- Title: Deep Hyperspectral Unmixing using Transformer Network
- Authors: Preetam Ghosh, Swalpa Kumar Roy, Bikram Koirala, Behnood Rasti, and
Paul Scheunders
- Abstract summary: We propose a novel deep unmixing model with transformers.
The proposed model is a combination of a convolutional autoencoder and a transformer.
The data are reconstructed using a convolutional decoder.
- Score: 7.3050653207383025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently, this paper is under review in IEEE. Transformers have intrigued
the vision research community with their state-of-the-art performance in
natural language processing. With their superior performance, transformers have
found their way in the field of hyperspectral image classification and achieved
promising results. In this article, we harness the power of transformers to
conquer the task of hyperspectral unmixing and propose a novel deep unmixing
model with transformers. We aim to utilize the ability of transformers to
better capture the global feature dependencies in order to enhance the quality
of the endmember spectra and the abundance maps. The proposed model is a
combination of a convolutional autoencoder and a transformer. The hyperspectral
data is encoded by the convolutional encoder. The transformer captures
long-range dependencies between the representations derived from the encoder.
The data are reconstructed using a convolutional decoder. We applied the
proposed unmixing model to three widely used unmixing datasets, i.e., Samson,
Apex, and Washington DC mall and compared it with the state-of-the-art in terms
of root mean squared error and spectral angle distance. The source code for the
proposed model will be made publicly available at
\url{https://github.com/preetam22n/DeepTrans-HSU}.
Related papers
- Dynamic Grained Encoder for Vision Transformers [150.02797954201424]
This paper introduces sparse queries for vision transformers to exploit the intrinsic spatial redundancy of natural images.
We propose a Dynamic Grained for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.
Our encoder allows the state-of-the-art vision transformers to reduce computational complexity by 40%-60% while maintaining comparable performance on image classification.
arXiv Detail & Related papers (2023-01-10T07:55:29Z) - Cats: Complementary CNN and Transformer Encoders for Segmentation [13.288195115791758]
We propose a model with double encoders for 3D biomedical image segmentation.
We fuse the information from the convolutional encoder and the transformer, and pass it to the decoder to obtain the results.
Compared to the state-of-the-art models with and without transformers on each task, our proposed method obtains higher Dice scores across the board.
arXiv Detail & Related papers (2022-08-24T14:25:11Z) - Deep Laparoscopic Stereo Matching with Transformers [46.18206008056612]
Self-attention mechanism, successfully employed with the transformer structure, is shown promise in many computer vision tasks.
We propose a new hybrid deep stereo matching framework (HybridStereoNet) that combines the best of the CNN and the transformer in a unified design.
arXiv Detail & Related papers (2022-07-25T12:54:32Z) - HiFormer: Hierarchical Multi-scale Representations Using Transformers
for Medical Image Segmentation [3.478921293603811]
HiFormer is a novel method that efficiently bridges a CNN and a transformer for medical image segmentation.
To secure a fine fusion of global and local features, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure.
arXiv Detail & Related papers (2022-07-18T11:30:06Z) - Multimodal Fusion Transformer for Remote Sensing Image Classification [35.57881383390397]
Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs)
To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters.
We introduce a new multimodal fusion transformer (MFT) network which comprises a multihead cross patch attention (mCrossPA) for HSI land-cover classification.
arXiv Detail & Related papers (2022-03-31T11:18:41Z) - SepTr: Separable Transformer for Audio Spectrogram Processing [74.41172054754928]
We propose a new vision transformer architecture called Separable Transformer (SepTr)
SepTr employs two transformer blocks in a sequential manner, the first attending to tokens within the same frequency bin, and the second attending to tokens within the same time interval.
We conduct experiments on three benchmark data sets, showing that our architecture outperforms conventional vision transformers and other state-of-the-art methods.
arXiv Detail & Related papers (2022-03-17T19:48:43Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - Vision Transformer with Progressive Sampling [73.60630716500154]
We propose an iterative and progressive sampling strategy to locate discriminative regions.
When trained from scratch on ImageNet, PS-ViT performs 3.8% higher than the vanilla ViT in terms of top-1 accuracy.
arXiv Detail & Related papers (2021-08-03T18:04:31Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.