Hyperspectral Adapter for Semantic Segmentation with Vision Foundation Models
- URL: http://arxiv.org/abs/2509.20107v2
- Date: Thu, 25 Sep 2025 11:37:47 GMT
- Title: Hyperspectral Adapter for Semantic Segmentation with Vision Foundation Models
- Authors: Juana Valeria Hurtado, Rohit Mohan, Abhinav Valada,
- Abstract summary: Hyperspectral imaging (HSI) captures spatial information along with dense spectral measurements across numerous narrow wavelength bands.<n>Our architecture incorporates a spectral transformer and a spectrum-aware spatial prior module to extract rich spatial-spectral features.<n>Our architecture achieves state-of-the-art semantic segmentation performance while directly using HSI inputs, outperforming both vision-based and hyperspectral segmentation methods.
- Score: 18.24287471339871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperspectral imaging (HSI) captures spatial information along with dense spectral measurements across numerous narrow wavelength bands. This rich spectral content has the potential to facilitate robust robotic perception, particularly in environments with complex material compositions, varying illumination, or other visually challenging conditions. However, current HSI semantic segmentation methods underperform due to their reliance on architectures and learning frameworks optimized for RGB inputs. In this work, we propose a novel hyperspectral adapter that leverages pretrained vision foundation models to effectively learn from hyperspectral data. Our architecture incorporates a spectral transformer and a spectrum-aware spatial prior module to extract rich spatial-spectral features. Additionally, we introduce a modality-aware interaction block that facilitates effective integration of hyperspectral representations and frozen vision Transformer features through dedicated extraction and injection mechanisms. Extensive evaluations on three benchmark autonomous driving datasets demonstrate that our architecture achieves state-of-the-art semantic segmentation performance while directly using HSI inputs, outperforming both vision-based and hyperspectral segmentation methods. We make the code available at https://hsi-adapter.cs.uni-freiburg.de.
Related papers
- Few-Shot Video Object Segmentation in X-Ray Angiography Using Local Matching and Spatio-Temporal Consistency Loss [13.850743997507488]
We introduce a novel FSVOS model that employs a local matching strategy to restrict the search space to the most neighboring pixels.<n>Specifically, we implement non-parametric sampling mechanism that enables dynamically varying sampling regions.<n>This work offers enhanced potential for a wide range of clinical applications.
arXiv Detail & Related papers (2026-01-02T21:26:28Z) - SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping [9.392313789126714]
SpecAware is a novel hyperspectral spectral-content aware foundation model for unifying multi-sensor learning for HSI mapping.<n>The core of SpecAware is a two-step hypernetwork-driven encoding process for HSI data.<n>Experiments on six datasets demonstrate that SpecAware can learn superior feature representations.
arXiv Detail & Related papers (2025-10-31T06:28:14Z) - Hyperspectral Mamba for Hyperspectral Object Tracking [56.365517163296936]
A new hyperspectral object tracking network equipped with Mamba (HyMamba) is proposed.<n>It unifies spectral, cross-depth, and temporal modeling through state space modules (SSMs)<n>HyMamba achieves state-of-the-art performance on seven benchmark datasets.
arXiv Detail & Related papers (2025-09-10T03:47:43Z) - GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images [68.33481681452675]
We propose a graph-enhanced contextual and regional perception network (GCRPNet)<n>It builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation.<n>It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information.
arXiv Detail & Related papers (2025-08-14T11:31:43Z) - AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [49.81255045696323]
We present the Auxiliary Metadata Driven Infrared Small Target Detector (AuxDet)<n>AuxDet integrates metadata semantics with visual features, guiding adaptive representation learning for each sample.<n>Experiments on the challenging WideIRSTD-Full benchmark demonstrate that AuxDet consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-05-21T07:02:05Z) - Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data [14.104497777255137]
We introduce Low-rank Efficient Spatial-Spectral Vision Transformer with three key innovations.<n>We pretrain LESS ViT using a Hyperspectral Masked Autoencoder framework with integrated positional and channel masking strategies.<n> Experimental results demonstrate that our proposed method achieves competitive performance against state-of-the-art multi-modal geospatial foundation models.
arXiv Detail & Related papers (2025-03-17T05:42:19Z) - DiffFormer: a Differential Spatial-Spectral Transformer for Hyperspectral Image Classification [3.271106943956333]
Hyperspectral image classification (HSIC) has gained significant attention because of its potential in analyzing high-dimensional data with rich spectral and spatial information.<n>We propose the Differential Spatial-Spectral Transformer (DiffFormer) to address the inherent challenges of HSIC, such as spectral redundancy and spatial discontinuity.<n>Experiments on benchmark hyperspectral datasets demonstrate the superiority of DiffFormer in terms of classification accuracy, computational efficiency, and generalizability.
arXiv Detail & Related papers (2024-12-23T07:21:41Z) - AMBER -- Advanced SegFormer for Multi-Band Image Segmentation: an application to Hyperspectral Imaging [0.0]
This paper introduces AMBER, an advanced SegFormer specifically designed for multi-band image segmentation.<n>AMBER enhances the original SegFormer by incorporating three-dimensional convolutions, custom kernel sizes, and a Funnelizer layer.<n>Our experiments, conducted on three benchmark datasets and on a dataset from the PRISMA satellite, show that AMBER outperforms traditional CNN-based methods in terms of Overall Accuracy, Kappa coefficient, and Average Accuracy.
arXiv Detail & Related papers (2024-09-14T09:34:05Z) - GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification [19.740333867168108]
GraphMamba is an efficient graph structure learning vision Mamba classification framework to achieve deep spatial-spectral information mining.
The core components of GraphMamba include the HyperMamba module for improving computational efficiency and the SpectralGCN module for adaptive spatial context awareness.
arXiv Detail & Related papers (2024-07-11T07:56:08Z) - HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model [88.13261547704444]
Hyper SIGMA is a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes.<n>In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images.
arXiv Detail & Related papers (2024-06-17T13:22:58Z) - SSF-Net: Spatial-Spectral Fusion Network with Spectral Angle Awareness for Hyperspectral Object Tracking [21.664141982246598]
Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously.<n>Existing methods primarily focus on band regrouping and rely on RGB trackers for feature extraction.<n>In this paper, a spatial-spectral fusion network with spectral angle awareness (SST-Net) is proposed for hyperspectral (HS) object tracking.
arXiv Detail & Related papers (2024-03-09T09:37:13Z) - Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image
Reconstruction [127.20208645280438]
Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement.
Modeling the inter-spectra interactions is beneficial for HSI reconstruction.
Mask-guided Spectral-wise Transformer (MST) proposes a novel framework for HSI reconstruction.
arXiv Detail & Related papers (2021-11-15T16:59:48Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.