Multi-Scale Spectral Attention Module-based Hyperspectral Segmentation in Autonomous Driving Scenarios
- URL: http://arxiv.org/abs/2506.18682v1
- Date: Mon, 23 Jun 2025 14:24:20 GMT
- Title: Multi-Scale Spectral Attention Module-based Hyperspectral Segmentation in Autonomous Driving Scenarios
- Authors: Imad Ali Shah, Jiarong Li, Tim Brophy, Martin Glavin, Edward Jones, Enda Ward, Brian Deegan,
- Abstract summary: This paper introduces a Multi-scale Spectral Attention Module (MSAM) that enhances spectral feature extraction.<n>By integrating MSAM into UNet's skip connections (UNet-SC), our proposed UNet-MSAM achieves significant improvements in semantic segmentation performance.
- Score: 3.437245452211197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in autonomous driving (AD) have highlighted the potential of Hyperspectral Imaging (HSI) for enhanced environmental perception, particularly in challenging weather and lighting conditions. However, efficiently processing its high-dimensional spectral data remains a significant challenge. This paper introduces a Multi-scale Spectral Attention Module (MSAM) that enhances spectral feature extraction through three parallel 1D convolutions with varying kernel sizes between 1 to 11, coupled with an adaptive feature aggregation mechanism. By integrating MSAM into UNet's skip connections (UNet-SC), our proposed UNet-MSAM achieves significant improvements in semantic segmentation performance across multiple HSI datasets: HyKo-VIS v2, HSI-Drive v2, and Hyperspectral City v2. Our comprehensive experiments demonstrate that with minimal computational overhead (on average 0.02% in parameters and 0.82% GFLOPS), UNet-MSAM consistently outperforms UNet-SC, achieving average improvements of 3.61% in mean IoU and 3.80% in mF1 across the three datasets. Through extensive ablation studies, we have established that multi-scale kernel combinations perform better than single-scale configurations. These findings demonstrate the potential of HSI processing for AD and provide valuable insights into designing robust, multi-scale spectral feature extractors for real-world applications.
Related papers
- SAMSA 2.0: Prompting Segment Anything with Spectral Angles for Hyperspectral Interactive Medical Image Segmentation [5.9804425273111095]
We present SAMSA 2.0, an interactive segmentation framework for hyperspectral medical imaging.<n>It introduces spectral angle prompting to guide the Segment Anything Model (SAM) using spectral similarity alongside spatial cues.<n>Without retraining, SAMSA 2.0 achieves up to +3.8% higher Dice scores compared to RGB-only models and up to +3.1% over prior spectral fusion methods.
arXiv Detail & Related papers (2025-08-01T10:16:26Z) - MS-Occ: Multi-Stage LiDAR-Camera Fusion for 3D Semantic Occupancy Prediction [15.656771219382076]
MS-Occ is a novel multi-stage LiDAR-camera fusion framework.<n>It integrates LiDAR's geometric fidelity with camera-based semantic richness.<n>Experiments show MS-Occ achieves an Intersection over Union (IoU) of 32.1% and a mean IoU (mIoU) of 25.3%.
arXiv Detail & Related papers (2025-04-22T13:33:26Z) - PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [76.95536611263356]
PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z) - S3TU-Net: Structured Convolution and Superpixel Transformer for Lung Nodule Segmentation [5.2752693301728355]
We propose a segmentation model, S3TU-Net, which integrates multi-dimensional spatial connectors and a superpixel-based visual transformer.
S3TU-Net is built on a multi-view CNN-Transformer hybrid architecture, incorporating superpixel algorithms, structured weighting, and spatial shifting techniques.
Experimental results on the LIDC-IDRI dataset demonstrate that S3TU-Net achieves a DSC, precision, and IoU of 89.04%, 90.73%, and 90.70%, respectively.
arXiv Detail & Related papers (2024-11-19T15:00:18Z) - HRSAM: Efficient Interactive Segmentation in High-Resolution Images [59.537068118473066]
Segment Anything Model (SAM) has advanced interactive segmentation but is limited by the high computational cost on high-resolution images.
We focus on visual length extrapolation and propose a lightweight model named HRSAM.
The extrapolation enables HRSAM trained on low resolutions to generalize to high resolutions.
arXiv Detail & Related papers (2024-07-02T09:51:56Z) - HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model [88.13261547704444]
Hyper SIGMA is a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes.<n>In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images.
arXiv Detail & Related papers (2024-06-17T13:22:58Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - HSIDMamba: Exploring Bidirectional State-Space Models for Hyperspectral Denoising [11.022546457796949]
We propose HSIDMamba(HSDM), tailored to exploit the linear complexity for effectively capturing spatial-spectral dependencies in HSI denoising.
HSDM comprises multiple Hyperspectral Continuous Scan Blocks, incorporating BCSM(Bidirectional Continuous Scanning Mechanism), scale residual, and spectral attention mechanisms.
BCSM strengthens spatial-spectral interactions by linking forward and backward scans and enhancing information from eight directions through SSM.
arXiv Detail & Related papers (2024-04-15T11:59:19Z) - Superpixel Graph Contrastive Clustering with Semantic-Invariant
Augmentations for Hyperspectral Images [64.72242126879503]
Hyperspectral images (HSI) clustering is an important but challenging task.
We first use 3-D and 2-D hybrid convolutional neural networks to extract the high-order spatial and spectral features of HSI.
We then design a superpixel graph contrastive clustering model to learn discriminative superpixel representations.
arXiv Detail & Related papers (2024-03-04T07:40:55Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and
Multispectral Data Fusion [54.668445421149364]
Deep learning-based hyperspectral image (HSI) super-resolution aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs)
In this letter, we propose a novel adversarial automatic data augmentation framework ADASR that automatically optimize and augments HSI-MSI sample pairs to enrich data diversity for HSI-MSI fusion.
arXiv Detail & Related papers (2023-10-11T07:30:37Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - Hybrid Spectral Denoising Transformer with Guided Attention [34.34075175179669]
We present a Hybrid Spectral Denoising Transformer (HSDT) for hyperspectral image denoising.
Our HSDT significantly outperforms the existing state-of-the-art methods while maintaining low computational overhead.
arXiv Detail & Related papers (2023-03-16T02:24:31Z) - Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral
Super-Resolution [79.97180849505294]
We propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet, to enhance the spatial resolution of HSI.
Experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models.
arXiv Detail & Related papers (2020-07-10T08:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.