SARMAE: Masked Autoencoder for SAR Representation Learning
- URL: http://arxiv.org/abs/2512.16635v1
- Date: Thu, 18 Dec 2025 15:10:19 GMT
- Title: SARMAE: Masked Autoencoder for SAR Representation Learning
- Authors: Danxu Liu, Di Wang, Hebaixu Wang, Haoyang Chen, Wentao Jiang, Yilin Cheng, Haonan Guo, Wei Cui, Jing Zhang,
- Abstract summary: We propose SARMAE, a Noise-Aware Masked Autoencoder for self-supervised SAR representation learning.<n>SARMAE injects SAR-specific speckle noise into masked autoencoders to facilitate noise-aware and robust representation learning.<n>Experiments across multiple SAR datasets demonstrate that SARMAE achieves state-of-the-art performance on classification, detection, and segmentation tasks.
- Score: 17.36199520462285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic Aperture Radar (SAR) imagery plays a critical role in all-weather, day-and-night remote sensing applications. However, existing SAR-oriented deep learning is constrained by data scarcity, while the physically grounded speckle noise in SAR imagery further hampers fine-grained semantic representation learning. To address these challenges, we propose SARMAE, a Noise-Aware Masked Autoencoder for self-supervised SAR representation learning. Specifically, we construct SAR-1M, the first million-scale SAR dataset, with additional paired optical images, to enable large-scale pre-training. Building upon this, we design Speckle-Aware Representation Enhancement (SARE), which injects SAR-specific speckle noise into masked autoencoders to facilitate noise-aware and robust representation learning. Furthermore, we introduce Semantic Anchor Representation Constraint (SARC), which leverages paired optical priors to align SAR features and ensure semantic consistency. Extensive experiments across multiple SAR datasets demonstrate that SARMAE achieves state-of-the-art performance on classification, detection, and segmentation tasks. Code and models will be available at https://github.com/MiliLab/SARMAE.
Related papers
- SARCLIP: A Vision Language Foundation Model for Semantic Understanding and Target Recognition in SAR Imagery [46.87845911116779]
We introduce SARCLIP, the first vision language foundation model tailored for the SAR domain.<n>SARCLIP is trained using a contrastive vision language learning approach by domain transferring strategy.<n>Experiments on image-text retrieval and zero-shot classification tasks demonstrate the superior performance of SARCLIP.
arXiv Detail & Related papers (2025-10-26T13:04:50Z) - Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z) - Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [51.74614065919118]
This paper introduces SegEarth-OV, the first framework for annotation-free open-vocabulary segmentation of RS images.<n>We propose SimFeatUp, a universal upsampler that robustly restores high-resolution spatial details from coarse features.<n>We also present a simple yet effective Global Bias Alleviation operation to subtract the inherent global context from patch features.
arXiv Detail & Related papers (2025-08-25T14:22:57Z) - SAR-W-MixMAE: SAR Foundation Model Training Using Backscatter Power Weighting [3.618534280726541]
Foundation model approaches such as masked auto-encoders (MAE) or its variations are now being successfully applied to satellite imagery.<n>Due to difficulty in semantic labeling to create datasets and higher noise content with respect to optical images, Synthetic Aperture Radar (SAR) data has not been explored a lot in the field for foundation models.<n>In this work, we explored masked auto-encoder, specifically MixMAE on Sentinel-1 SAR images and its impact on SAR image classification tasks.
arXiv Detail & Related papers (2025-03-03T05:09:44Z) - Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers [58.80845404416028]
Data-free quantization (DFQ) enables model quantization without accessing real data, addressing concerns regarding data security and privacy.<n>With the growing adoption of Vision Transformers (ViTs), DFQ for ViTs has garnered significant attention.<n>We propose SARDFQ, a novel Semantics Alignment and Reinforcement Data-Free Quantization method for ViTs.
arXiv Detail & Related papers (2024-12-21T09:30:45Z) - SAFE: a SAR Feature Extractor based on self-supervised learning and masked Siamese ViTs [5.961207817077044]
We propose a novel self-supervised learning framework based on masked Siamese Vision Transformers to create a General SAR Feature Extractor coined SAFE.
Our method leverages contrastive learning principles to train a model on unlabeled SAR data, extracting robust and generalizable features.
We introduce tailored data augmentation techniques specific to SAR imagery, such as sub-aperture decomposition and despeckling.
Our network competes with or surpasses other state-of-the-art methods in few-shot classification and segmentation tasks, even without being trained on the sensors used for the evaluation.
arXiv Detail & Related papers (2024-06-30T23:11:20Z) - SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection [79.23689506129733]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection.<n>Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets.<n>To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture [23.375515181854254]
Self-Supervised Learning (SSL) methods can achieve various SAR Automatic Target Recognition (ATR) tasks with pre-training in large-scale unlabeled data.
SSL aims to construct supervision signals directly from the data, which minimizes the need for expensive expert annotation.
This study investigates an effective SSL method for SAR ATR, which can pave the way for a foundation model in SAR ATR.
arXiv Detail & Related papers (2023-11-26T01:05:55Z) - SAR2SAR: a semi-supervised despeckling algorithm for SAR images [3.9490074068698]
Deep learning algorithm with self-supervision is proposed in this paper: SAR2SAR.
The strategy to adapt it to SAR despeckling is presented, based on a compensation of temporal changes and a loss function adapted to the statistics of speckle.
Results on real images are discussed, to show the potential of the proposed algorithm.
arXiv Detail & Related papers (2020-06-26T15:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.