Multi-Scale U-Shape MLP for Hyperspectral Image Classification
- URL: http://arxiv.org/abs/2307.10186v1
- Date: Wed, 5 Jul 2023 08:52:27 GMT
- Title: Multi-Scale U-Shape MLP for Hyperspectral Image Classification
- Authors: Moule Lin, Weipeng Jing, Donglin Di, Guangsheng Chen, Houbing Song
- Abstract summary: Two challenges in identifying pixels of the hyperspectral image are respectively representing the correlated information among the local and global, as well as the abundant parameters of the model.
We propose a Multi-Scale U-shape Multi-Layer Perceptron (MUMLP) model consisting of the designed MSC (Multi-Scale Channel) block and the U-shape Multi-Layer Perceptron structure.
Our model can outperform state-of-the-art methods across-the-board on three wide-adopted public datasets.
- Score: 13.85573689689951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperspectral images have significant applications in various domains, since
they register numerous semantic and spatial information in the spectral band
with spatial variability of spectral signatures. Two critical challenges in
identifying pixels of the hyperspectral image are respectively representing the
correlated information among the local and global, as well as the abundant
parameters of the model. To tackle this challenge, we propose a Multi-Scale
U-shape Multi-Layer Perceptron (MUMLP) a model consisting of the designed MSC
(Multi-Scale Channel) block and the UMLP (U-shape Multi-Layer Perceptron)
structure. MSC transforms the channel dimension and mixes spectral band feature
to embed the deep-level representation adequately. UMLP is designed by the
encoder-decoder structure with multi-layer perceptron layers, which is capable
of compressing the large-scale parameters. Extensive experiments are conducted
to demonstrate our model can outperform state-of-the-art methods
across-the-board on three wide-adopted public datasets, namely Pavia
University, Houston 2013 and Houston 2018
Related papers
- HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model [88.13261547704444]
Hyper SIGMA is a vision transformer-based foundation model for HSI interpretation.
It integrates spatial and spectral features using a specially designed spectral enhancement module.
It shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability.
arXiv Detail & Related papers (2024-06-17T13:22:58Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - MultiScale Spectral-Spatial Convolutional Transformer for Hyperspectral
Image Classification [9.051982753583232]
Transformer has become an alternative architecture of CNNs for hyperspectral image classification.
We propose a multiscale spectral-spatial convolutional Transformer (MultiscaleFormer) for hyperspectral image classification.
arXiv Detail & Related papers (2023-10-28T00:41:35Z) - Multiview Transformer: Rethinking Spatial Information in Hyperspectral
Image Classification [43.17196501332728]
Identifying the land cover category for each pixel in a hyperspectral image relies on spectral and spatial information.
In this article, we investigate that scene-specific but not essential correlations may be recorded in an HSI cuboid.
We propose a multiview transformer for HSI classification, which consists of multiview principal component analysis (MPCA), spectral encoder-decoder (SED), and spatial-pooling tokenization transformer (SPTT)
arXiv Detail & Related papers (2023-10-11T04:25:24Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation [7.415370401064414]
We propose AerialFormer, which unifies Transformers at the contracting path with lightweight Multi-Dilated Conal Neural Networks (MD-CNNs) at the expanding path.
Our AerialFormer is designed as a hierarchical structure, in which Transformer outputs multi-scale features and MD-CNNs decoder aggregates information from the multi-scales.
We have benchmarked AerialFormer on three common datasets including iSAID, LoveDA, and Potsdam.
arXiv Detail & Related papers (2023-06-12T03:28:18Z) - Multi-spectral Class Center Network for Face Manipulation Detection and Localization [52.569170436393165]
We propose a novel Multi-Spectral Class Center Network (MSCCNet) for face manipulation detection and localization.
Based on the features of different frequency bands, the MSCC module collects multi-spectral class centers and computes pixel-to-class relations.
Applying multi-spectral class-level representations suppresses the semantic information of the visual concepts which is insensitive to manipulated regions of forgery images.
arXiv Detail & Related papers (2023-05-18T08:09:20Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers [124.01928050651466]
We propose a new type of polyp segmentation method, named Polyp-PVT.
The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities.
arXiv Detail & Related papers (2021-08-16T07:09:06Z) - Learning deep multiresolution representations for pansharpening [4.469255274378329]
This paper proposes a pyramid based deep fusion framework that preserves spectral and spatial characteristics at different scales.
Experiments suggest that the proposed architecture outperforms state of the art pansharpening models.
arXiv Detail & Related papers (2021-02-16T19:41:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.