Dual-Domain Masked Image Modeling: A Self-Supervised Pretraining Strategy Using Spatial and Frequency Domain Masking for Hyperspectral Data
- URL: http://arxiv.org/abs/2505.03220v1
- Date: Tue, 06 May 2025 06:24:21 GMT
- Title: Dual-Domain Masked Image Modeling: A Self-Supervised Pretraining Strategy Using Spatial and Frequency Domain Masking for Hyperspectral Data
- Authors: Shaheer Mohamed, Tharindu Fernando, Sridha Sridharan, Peyman Moghadam, Clinton Fookes,
- Abstract summary: We propose a self-supervised pretraining strategy for hyperspectral data that utilizes the large portion of unlabeled data.<n>Our method introduces a novel dual-domain masking mechanism that operates in both spatial and frequency domains.<n>We evaluate our approach on three publicly available HSI classification benchmarks and demonstrate that it achieves state-of-the-art performance.
- Score: 35.34526230299484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperspectral images (HSIs) capture rich spectral signatures that reveal vital material properties, offering broad applicability across various domains. However, the scarcity of labeled HSI data limits the full potential of deep learning, especially for transformer-based architectures that require large-scale training. To address this constraint, we propose Spatial-Frequency Masked Image Modeling (SFMIM), a self-supervised pretraining strategy for hyperspectral data that utilizes the large portion of unlabeled data. Our method introduces a novel dual-domain masking mechanism that operates in both spatial and frequency domains. The input HSI cube is initially divided into non-overlapping patches along the spatial dimension, with each patch comprising the entire spectrum of its corresponding spatial location. In spatial masking, we randomly mask selected patches and train the model to reconstruct the masked inputs using the visible patches. Concurrently, in frequency masking, we remove portions of the frequency components of the input spectra and predict the missing frequencies. By learning to reconstruct these masked components, the transformer-based encoder captures higher-order spectral-spatial correlations. We evaluate our approach on three publicly available HSI classification benchmarks and demonstrate that it achieves state-of-the-art performance. Notably, our model shows rapid convergence during fine-tuning, highlighting the efficiency of our pretraining strategy.
Related papers
- Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts [22.75047167955269]
We introduce Land-MoE, a novel approach for multispectral land cover classification (MLCC)<n>Land-MoE comprises two key modules: the mixture of low-rank token experts (MoLTE) and frequency-aware filters (FAF)
arXiv Detail & Related papers (2025-05-20T08:52:28Z) - SpectralMamba: Efficient Mamba for Hyperspectral Image Classification [39.18999103115206]
Recurrent neural networks and Transformers have dominated most applications in hyperspectral (HS) imaging.
We propose SpectralMamba -- a novel state space model incorporated efficient deep learning framework for HS image classification.
We show that SpectralMamba surprisingly creates promising win-wins from both performance and efficiency perspectives.
arXiv Detail & Related papers (2024-04-12T14:12:03Z) - Frequency-Aware Deepfake Detection: Improving Generalizability through
Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images.
Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries.
We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z) - SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote
Sensing Image Classification [35.52272615695294]
We propose a spatial-spectral masked auto-encoder (SS-MAE) for HSI and LiDAR/SAR data joint classification.
Our SS-MAE fully exploits the spatial and spectral representations of the input data.
To complement local features in the training stage, we add two lightweight CNNs for feature extraction.
arXiv Detail & Related papers (2023-11-08T03:54:44Z) - S^2-Transformer for Mask-Aware Hyperspectral Image Reconstruction [59.39343894089959]
A snapshot compressive imager (CASSI) with Transformer reconstruction backend remarks high-fidelity sensing performance.<n> dominant spatial and spectral attention designs show limitations in hyperspectral modeling.<n>We propose a spatial-spectral (S2-) Transformer implemented by a paralleled attention design and a mask-aware learning strategy.
arXiv Detail & Related papers (2022-09-24T19:26:46Z) - Masked Frequency Modeling for Self-Supervised Visual Pre-Training [102.89756957704138]
We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models.
MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum.
For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model, (iii) mask token.
arXiv Detail & Related papers (2022-06-15T17:58:30Z) - Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD.
We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z) - Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image
Reconstruction [127.20208645280438]
Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement.
Modeling the inter-spectra interactions is beneficial for HSI reconstruction.
Mask-guided Spectral-wise Transformer (MST) proposes a novel framework for HSI reconstruction.
arXiv Detail & Related papers (2021-11-15T16:59:48Z) - Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in
Frequency Domain [88.7339322596758]
We present a novel Spatial-Phase Shallow Learning (SPSL) method, which combines spatial image and phase spectrum to capture the up-sampling artifacts of face forgery.
SPSL can achieve the state-of-the-art performance on cross-datasets evaluation as well as multi-class classification and obtain comparable results on single dataset evaluation.
arXiv Detail & Related papers (2021-03-02T16:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.