Multiscale Convolutional Transformer with Center Mask Pretraining for
Hyperspectral Image Classificationtion
- URL: http://arxiv.org/abs/2203.04771v1
- Date: Wed, 9 Mar 2022 14:42:26 GMT
- Title: Multiscale Convolutional Transformer with Center Mask Pretraining for
Hyperspectral Image Classificationtion
- Authors: Yifan Wang, Sen Jia, Zhongfan Zhang
- Abstract summary: We propose a noval multi-scale convolutional embedding module for hyperspectral images (HSI) to realize effective extraction of spatial-spectral information.
Similar to Mask autoencoder, but our pre-training method only masks the corresponding token of the central pixel in the encoder, and inputs the remaining token into the decoder to reconstruct the spectral information of the central pixel.
- Score: 14.33259265286265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperspectral images (HSI) not only have a broad macroscopic field of view
but also contain rich spectral information, and the types of surface objects
can be identified through spectral information, which is one of the main
applications in hyperspectral image related research.In recent years, more and
more deep learning methods have been proposed, among which convolutional neural
networks (CNN) are the most influential. However, CNN-based methods are
difficult to capture long-range dependencies, and also require a large amount
of labeled data for model training.Besides, most of the self-supervised
training methods in the field of HSI classification are based on the
reconstruction of input samples, and it is difficult to achieve effective use
of unlabeled samples. To address the shortcomings of CNN networks, we propose a
noval multi-scale convolutional embedding module for HSI to realize effective
extraction of spatial-spectral information, which can be better combined with
Transformer network.In order to make more efficient use of unlabeled data, we
propose a new self-supervised pretask. Similar to Mask autoencoder, but our
pre-training method only masks the corresponding token of the central pixel in
the encoder, and inputs the remaining token into the decoder to reconstruct the
spectral information of the central pixel.Such a pretask can better model the
relationship between the central feature and the domain feature, and obtain
more stable training results.
Related papers
- Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - FocDepthFormer: Transformer with latent LSTM for Depth Estimation from Focal Stack [11.433602615992516]
We present a novel Transformer-based network, FocDepthFormer, which integrates a Transformer with an LSTM module and a CNN decoder.
By incorporating the LSTM, FocDepthFormer can be pre-trained on large-scale monocular RGB depth estimation datasets.
Our model outperforms state-of-the-art approaches across multiple evaluation metrics.
arXiv Detail & Related papers (2023-10-17T11:53:32Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Multi-spectral Class Center Network for Face Manipulation Detection and Localization [52.569170436393165]
We propose a novel Multi-Spectral Class Center Network (MSCCNet) for face manipulation detection and localization.
Based on the features of different frequency bands, the MSCC module collects multi-spectral class centers and computes pixel-to-class relations.
Applying multi-spectral class-level representations suppresses the semantic information of the visual concepts which is insensitive to manipulated regions of forgery images.
arXiv Detail & Related papers (2023-05-18T08:09:20Z) - Visual Recognition with Deep Nearest Centroids [57.35144702563746]
We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition.
Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition (ADE20K, Cityscapes)
arXiv Detail & Related papers (2022-09-15T15:47:31Z) - Agricultural Plantation Classification using Transfer Learning Approach
based on CNN [0.0]
The efficiency of hyper-spectral image recognition has increased significantly with deep learning.
CNN and Multi-Layer Perceptron(MLP) has demonstrated to be an excellent process of classifying images.
We propose using the method of transfer learning to decrease the training time and reduce the dependence on large labeled data-set.
arXiv Detail & Related papers (2022-06-19T14:43:31Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Learning A 3D-CNN and Transformer Prior for Hyperspectral Image
Super-Resolution [80.93870349019332]
We propose a novel HSISR method that uses Transformer instead of CNN to learn the prior of HSIs.
Specifically, we first use the gradient algorithm to solve the HSISR model, and then use an unfolding network to simulate the iterative solution processes.
arXiv Detail & Related papers (2021-11-27T15:38:57Z) - New SAR target recognition based on YOLO and very deep multi-canonical
correlation analysis [0.1503974529275767]
This paper proposes a robust feature extraction method for SAR image target classification by adaptively fusing effective features from different CNN layers.
Experiments on the MSTAR dataset demonstrate that the proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T18:10:26Z) - Compressive spectral image classification using 3D coded convolutional
neural network [12.67293744927537]
This paper develops a novel deep learning HIC approach based on measurements of coded-aperture snapshot spectral imagers (CASSI)
A new kind of deep learning strategy, namely 3D coded convolutional neural network (3D-CCNN), is proposed to efficiently solve for the classification problem.
The accuracy of classification is effectively improved by exploiting the synergy between the deep learning network and coded apertures.
arXiv Detail & Related papers (2020-09-23T15:05:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.