Curriculum Multi-Task Self-Supervision Improves Lightweight Architectures for Onboard Satellite Hyperspectral Image Segmentation
- URL: http://arxiv.org/abs/2509.13229v1
- Date: Tue, 16 Sep 2025 16:37:59 GMT
- Title: Curriculum Multi-Task Self-Supervision Improves Lightweight Architectures for Onboard Satellite Hyperspectral Image Segmentation
- Authors: Hugo Carlesso, Josiane Mothe, Radu Tudor Ionescu,
- Abstract summary: Hyperspectral imaging (HSI) captures detailed spectral signatures across hundreds of contiguous bands per pixel.<n>We introduce a novel curriculum multi-task self-supervised learning framework designed for lightweight architectures for HSI analysis.<n>CMTSSL integrates masked image modeling with decoupled spatial and spectral jigsaw puzzle solving.
- Score: 21.959448032308615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperspectral imaging (HSI) captures detailed spectral signatures across hundreds of contiguous bands per pixel, being indispensable for remote sensing applications such as land-cover classification, change detection, and environmental monitoring. Due to the high dimensionality of HSI data and the slow rate of data transfer in satellite-based systems, compact and efficient models are required to support onboard processing and minimize the transmission of redundant or low-value data, e.g. cloud-covered areas. To this end, we introduce a novel curriculum multi-task self-supervised learning (CMTSSL) framework designed for lightweight architectures for HSI analysis. CMTSSL integrates masked image modeling with decoupled spatial and spectral jigsaw puzzle solving, guided by a curriculum learning strategy that progressively increases data complexity during self-supervision. This enables the encoder to jointly capture fine-grained spectral continuity, spatial structure, and global semantic features. Unlike prior dual-task SSL methods, CMTSSL simultaneously addresses spatial and spectral reasoning within a unified and computationally efficient design, being particularly suitable for training lightweight models for onboard satellite deployment. We validate our approach on four public benchmark datasets, demonstrating consistent gains in downstream segmentation tasks, using architectures that are over 16,000x lighter than some state-of-the-art models. These results highlight the potential of CMTSSL in generalizable representation learning with lightweight architectures for real-world HSI applications. Our code is publicly available at https://github.com/hugocarlesso/CMTSSL.
Related papers
- SpectralTrain: A Universal Framework for Hyperspectral Image Classification [6.263680699548957]
This study introduces SpectralTrain, a universal, architecture-agnostic training framework.<n>It enhances learning efficiency by integrating curriculum learning with principal component analysis (PCA)-based spectral downsampling.<n>It is compatible with both classical and state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2025-11-20T06:19:26Z) - SpecAware: A Spectral-Content Aware Foundation Model for Unifying Multi-Sensor Learning in Hyperspectral Remote Sensing Mapping [9.392313789126714]
SpecAware is a novel hyperspectral spectral-content aware foundation model for unifying multi-sensor learning for HSI mapping.<n>The core of SpecAware is a two-step hypernetwork-driven encoding process for HSI data.<n>Experiments on six datasets demonstrate that SpecAware can learn superior feature representations.
arXiv Detail & Related papers (2025-10-31T06:28:14Z) - Hyperspectral Adapter for Semantic Segmentation with Vision Foundation Models [18.24287471339871]
Hyperspectral imaging (HSI) captures spatial information along with dense spectral measurements across numerous narrow wavelength bands.<n>Our architecture incorporates a spectral transformer and a spectrum-aware spatial prior module to extract rich spatial-spectral features.<n>Our architecture achieves state-of-the-art semantic segmentation performance while directly using HSI inputs, outperforming both vision-based and hyperspectral segmentation methods.
arXiv Detail & Related papers (2025-09-24T13:32:07Z) - Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [51.74614065919118]
This paper introduces SegEarth-OV, the first framework for annotation-free open-vocabulary segmentation of RS images.<n>We propose SimFeatUp, a universal upsampler that robustly restores high-resolution spatial details from coarse features.<n>We also present a simple yet effective Global Bias Alleviation operation to subtract the inherent global context from patch features.
arXiv Detail & Related papers (2025-08-25T14:22:57Z) - TimeSenCLIP: A Vision-Language Model for Remote Sensing Using Single-Pixel Time Series [9.263651699452996]
TimeSenCLIP is a lightweight framework that reevaluates the role of spatial context by evaluating the effectiveness of a single pixel.<n>By leveraging spectral and temporal information from Sentinel-2 imagery, we minimises the need for caption-based training.<n>Our approach is grounded in the LUCAS and Sen4Map datasets, and evaluated on classification tasks including LULC, crop type, and ecosystem type.
arXiv Detail & Related papers (2025-08-16T05:44:33Z) - Structural-Spectral Graph Convolution with Evidential Edge Learning for Hyperspectral Image Clustering [59.24638672786966]
Hyperspectral image (HSI) clustering assigns similar pixels to the same class without any annotations.<n>Existing graph neural networks (GNNs) cannot fully exploit the spectral information of the input HSI.<n>We propose a structural-spectral graph convolutional operator (SSGCO) tailored for graph-structured HSI superpixels.
arXiv Detail & Related papers (2025-06-11T16:41:34Z) - Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data [14.104497777255137]
We introduce Low-rank Efficient Spatial-Spectral Vision Transformer with three key innovations.<n>We pretrain LESS ViT using a Hyperspectral Masked Autoencoder framework with integrated positional and channel masking strategies.<n> Experimental results demonstrate that our proposed method achieves competitive performance against state-of-the-art multi-modal geospatial foundation models.
arXiv Detail & Related papers (2025-03-17T05:42:19Z) - Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.<n>Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z) - Paving the way toward foundation models for irregular and unaligned Satellite Image Time Series [0.0]
We propose an ALIgned Sits (ALISE) to take into account the spatial, spectral, and temporal dimensions of satellite imagery.
Unlike SSL models currently available for SITS, ALISE incorporates a flexible query mechanism to project the SITS into a common and learned temporal projection space.
The quality of the produced representation is assessed through three downstream tasks: crop segmentation (PASTIS), land cover segmentation (MultiSenGE) and a novel crop change detection dataset.
arXiv Detail & Related papers (2024-07-11T12:42:10Z) - HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model [88.13261547704444]
Hyper SIGMA is a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes.<n>In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images.
arXiv Detail & Related papers (2024-06-17T13:22:58Z) - A generic self-supervised learning (SSL) framework for representation
learning from spectra-spatial feature of unlabeled remote sensing imagery [4.397725469518669]
Self-supervised learning (SSL) enables the models to learn a representation from orders of magnitude more unlabelled data.
This work has designed a novel SSL framework that is capable of learning representation from both spectra-spatial information of unlabelled data.
arXiv Detail & Related papers (2023-06-27T23:50:43Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral
Super-Resolution [79.97180849505294]
We propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet, to enhance the spatial resolution of HSI.
Experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models.
arXiv Detail & Related papers (2020-07-10T08:08:20Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.