Locality-Aware Hyperspectral Classification
- URL: http://arxiv.org/abs/2309.01561v1
- Date: Mon, 4 Sep 2023 12:29:32 GMT
- Title: Locality-Aware Hyperspectral Classification
- Authors: Fangqin Zhou, Mert Kilickaya, Joaquin Vanschoren
- Abstract summary: We introduce the Hyperspectral Locality-aware Image TransformEr (HyLITE), a vision transformer that models both local and spectral information.
Our proposed approach outperforms competing baselines by a significant margin, achieving up to 10% gains in accuracy.
- Score: 8.737375836744933
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hyperspectral image classification is gaining popularity for high-precision
vision tasks in remote sensing, thanks to their ability to capture visual
information available in a wide continuum of spectra. Researchers have been
working on automating Hyperspectral image classification, with recent efforts
leveraging Vision-Transformers. However, most research models only spectra
information and lacks attention to the locality (i.e., neighboring pixels),
which may be not sufficiently discriminative, resulting in performance
limitations. To address this, we present three contributions: i) We introduce
the Hyperspectral Locality-aware Image TransformEr (HyLITE), a vision
transformer that models both local and spectral information, ii) A novel
regularization function that promotes the integration of local-to-global
information, and iii) Our proposed approach outperforms competing baselines by
a significant margin, achieving up to 10% gains in accuracy. The trained models
and the code are available at HyLITE.
Related papers
- Vision Eagle Attention: A New Lens for Advancing Image Classification [0.8158530638728501]
I introduce Vision Eagle Attention, a novel attention mechanism that enhances visual feature extraction using convolutional spatial attention.
The model applies convolution to capture local spatial features and generates an attention map that selectively emphasizes the most informative regions of the image.
I have integrated Vision Eagle Attention into a lightweight ResNet-18 architecture, demonstrating that this combination results in an efficient and powerful model.
arXiv Detail & Related papers (2024-11-15T20:21:59Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system.
Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context.
Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z) - Transformers Fusion across Disjoint Samples for Hyperspectral Image Classification [2.1223532600703385]
3D Swin Transformer (3D-ST) excels in capturing intricate spatial relationships within images.
SST specializes in modeling long-range dependencies through self-attention mechanisms.
This paper introduces an attentional fusion of these two transformers to significantly enhance the classification performance of Hyperspectral Images (HSIs)
arXiv Detail & Related papers (2024-05-02T08:49:01Z) - 3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification [12.729885732069926]
Vision Transformers (ViTs) have shown promising classification performance over Convolutional Neural Networks (CNNs)
ViTs excel with sequential data, but they cannot extract spectral-spatial information like CNNs.
We propose a 3D-Convolution guided Spectral-Spatial Transformer (3D-ConvSST) for HSI classification.
arXiv Detail & Related papers (2024-04-20T03:39:54Z) - SpectralGPT: Spectral Remote Sensing Foundation Model [60.023956954916414]
A universal RS foundation model, named SpectralGPT, is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT)
Compared to existing foundation models, SpectralGPT accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data.
Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience.
arXiv Detail & Related papers (2023-11-13T07:09:30Z) - DiffSpectralNet : Unveiling the Potential of Diffusion Models for
Hyperspectral Image Classification [6.521187080027966]
We propose a new network called DiffSpectralNet, which combines diffusion and transformer techniques.
First, we use an unsupervised learning framework based on the diffusion model to extract both high-level and low-level spectral-spatial features.
The diffusion method is capable of extracting diverse and meaningful spectral-spatial features, leading to improvement in HSI classification.
arXiv Detail & Related papers (2023-10-29T15:26:37Z) - DCN-T: Dual Context Network with Transformer for Hyperspectral Image
Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.
We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images.
Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z) - Exploring Vision Transformers for Fine-grained Classification [0.0]
We propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes.
We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology.
arXiv Detail & Related papers (2021-06-19T23:57:31Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.