HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis
- URL: http://arxiv.org/abs/2407.16269v1
- Date: Tue, 23 Jul 2024 08:18:43 GMT
- Title: HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis
- Authors: Fangqin Zhou, Mert Kilickaya, Joaquin Vanschoren, Ran Piao,
- Abstract summary: We propose HyTAS, the first benchmark on transformer architecture search for Hyperspectral imaging.
We evaluate 12 different methods to identify the optimal transformer over 5 different datasets.
We perform an extensive factor analysis on the Hyperspectral transformer search performance.
- Score: 7.116403133334646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hyperspectral Imaging (HSI) plays an increasingly critical role in precise vision tasks within remote sensing, capturing a wide spectrum of visual data. Transformer architectures have significantly enhanced HSI task performance, while advancements in Transformer Architecture Search (TAS) have improved model discovery. To harness these advancements for HSI classification, we make the following contributions: i) We propose HyTAS, the first benchmark on transformer architecture search for Hyperspectral imaging, ii) We comprehensively evaluate 12 different methods to identify the optimal transformer over 5 different datasets, iii) We perform an extensive factor analysis on the Hyperspectral transformer search performance, greatly motivating future research in this direction. All benchmark materials are available at HyTAS.
Related papers
- Image-Based Multi-Survey Classification of Light Curves with a Pre-Trained Vision Transformer [31.76431580841178]
We explore the use of Swin Transformer V2, a pre-trained vision Transformer, for photometric classification in a multi-survey setting.<n>We evaluate different strategies for integrating data from the Zwicky Transient Facility (ZTF) and the Asteroid Terrestrial-impact Last Alert System (ATLAS)
arXiv Detail & Related papers (2025-07-15T20:30:21Z) - Transformers Meet Hyperspectral Imaging: A Comprehensive Study of Models, Challenges and Open Problems [0.0]
We review more than 300 papers published up to 2025 and present the first end-to-end survey dedicated to Transformer-based HSI classification.<n>The study categorizes every stage of a typical pipeline-pre-processing, patch or pixel tokenization, positional encoding, spatial-spectral feature extraction, multi-head self-attention variants, skip connections, and loss design.<n>We outline a research agenda prioritizing valuable public data sets, lightweight on-edge models, illumination and sensor shifts, and intrinsically interpretable attention mechanisms.
arXiv Detail & Related papers (2025-06-10T09:04:30Z) - SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data [1.2926587870771544]
We introduce a modified Vision Transformer (ViT) architecture designed for processing one or multiple synthetic aperture radar (SAR) images.
We propose an acquisition parameter encoding module that significantly guides the learning process.
Our approach achieves up to 17% improvement in terms of RMSE over baseline models.
arXiv Detail & Related papers (2025-04-11T11:06:12Z) - Evidential Transformers for Improved Image Retrieval [7.397099215417549]
We introduce the Evidential Transformer, an uncertainty-driven transformer model for improved and robust image retrieval.
We incorporate probabilistic methods into image retrieval, achieving robust and reliable results.
arXiv Detail & Related papers (2024-09-02T09:10:47Z) - Transformers Fusion across Disjoint Samples for Hyperspectral Image Classification [2.1223532600703385]
3D Swin Transformer (3D-ST) excels in capturing intricate spatial relationships within images.
SST specializes in modeling long-range dependencies through self-attention mechanisms.
This paper introduces an attentional fusion of these two transformers to significantly enhance the classification performance of Hyperspectral Images (HSIs)
arXiv Detail & Related papers (2024-05-02T08:49:01Z) - FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised
Pretraining [36.44039681893334]
Hyperspectral images (HSIs) contain rich spectral and spatial information.
Current state-of-the-art hyperspectral transformers only tokenize the input HSI sample along the spectral dimension.
We propose a novel factorized spectral-spatial transformer that incorporates factorized self-supervised pretraining procedures.
arXiv Detail & Related papers (2023-09-18T02:05:52Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - Vision Transformer Architecture Search [64.73920718915282]
Current vision transformers (ViTs) are simply inherited from natural language processing (NLP) tasks.
We propose an architecture search method, dubbed ViTAS, to search for the optimal architecture with similar hardware budgets.
Our searched architecture achieves $74.7%$ top-$1$ accuracy on ImageNet and is $2.5%$ superior than the current baseline ViT architecture.
arXiv Detail & Related papers (2021-06-25T15:39:08Z) - Exploring Vision Transformers for Fine-grained Classification [0.0]
We propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes.
We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology.
arXiv Detail & Related papers (2021-06-19T23:57:31Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z) - AutoTrans: Automating Transformer Design via Reinforced Architecture
Search [52.48985245743108]
This paper empirically explore how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand.
Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers.
arXiv Detail & Related papers (2020-09-04T08:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.