Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture
- URL: http://arxiv.org/abs/2311.15153v4
- Date: Fri, 29 Mar 2024 01:18:37 GMT
- Title: Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture
- Authors: Weijie Li, Yang Wei, Tianpeng Liu, Yuenan Hou, Yuxuan Li, Zhen Liu, Yongxiang Liu, Li Liu,
- Abstract summary: This study investigates an effective Self-Supervised Learning (SSL) method for SAR Automatic Target Recognition (ATR)
SSL aims to construct supervision signals directly from the data, which minimizes the need for expensive expert annotation.
We present a novel Joint-Embedding Predictive Architecture for SAR ATR (SAR-JEPA), which leverages local masked patches to predict the multi-scale SAR gradient representations of unseen context.
- Score: 23.375515181854254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing Synthetic Aperture Radar (SAR) data has the potential to build a foundation model through Self-Supervised Learning (SSL) methods, which can achieve various SAR Automatic Target Recognition (ATR) tasks with pre-training in large-scale unlabeled data and fine-tuning in small labeled samples. SSL aims to construct supervision signals directly from the data, which minimizes the need for expensive expert annotation and maximizes the use of the expanding data pool for a foundational model. This study investigates an effective SSL method for SAR ATR, which can pave the way for a foundation model in SAR ATR. The primary obstacles faced in SSL for SAR ATR are the small targets in remote sensing and speckle noise in SAR images, corresponding to the SSL approach and signals. To overcome these challenges, we present a novel Joint-Embedding Predictive Architecture for SAR ATR (SAR-JEPA), which leverages local masked patches to predict the multi-scale SAR gradient representations of unseen context. The key aspect of SAR-JEPA is integrating SAR domain features to ensure high-quality self-supervised signals as target features. Besides, we employ local masks and multi-scale features to accommodate the various small targets in remote sensing. By fine-tuning and evaluating our framework on three target recognition datasets (vehicle, ship, and aircraft) with four other datasets as pre-training, we demonstrate its outperformance over other SSL methods and its effectiveness with increasing SAR data. This study showcases the potential of SSL for SAR target recognition across diverse targets, scenes, and sensors.
Related papers
- SAFE: a SAR Feature Extractor based on self-supervised learning and masked Siamese ViTs [5.961207817077044]
We propose a novel self-supervised learning framework based on masked Siamese Vision Transformers to create a General SAR Feature Extractor coined SAFE.
Our method leverages contrastive learning principles to train a model on unlabeled SAR data, extracting robust and generalizable features.
We introduce tailored data augmentation techniques specific to SAR imagery, such as sub-aperture decomposition and despeckling.
Our network competes with or surpasses other state-of-the-art methods in few-shot classification and segmentation tasks, even without being trained on the sensors used for the evaluation.
arXiv Detail & Related papers (2024-06-30T23:11:20Z) - SARATR-X: A Foundation Model for Synthetic Aperture Radar Images Target Recognition [19.776275680586977]
This paper aims to achieve general SAR ATR based on a foundation model with Self-Supervised Learning (SSL)
A foundation model named SARATR-X is proposed with the following four aspects.
First, we integrated 14 datasets with various target categories and imaging conditions as a pre-training dataset. Second, different model backbones were discussed to find the most suitable approaches for remote-sensing images. Third, we applied two-stage training and SAR gradient features to ensure the diversity and scalability of SARATR-X.
arXiv Detail & Related papers (2024-05-15T14:17:44Z) - SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale
SAR Object Detection [83.21028626585986]
We establish a new benchmark dataset and an open-source method for large-scale SAR object detection.
Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets.
To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
arXiv Detail & Related papers (2024-03-11T09:20:40Z) - Benchmarking Deep Learning Classifiers for SAR Automatic Target
Recognition [7.858656052565242]
This paper comprehensively benchmarks several advanced deep learning models for SAR ATR with multiple distinct SAR imagery datasets.
We evaluate and compare the five classifiers concerning their classification accuracy runtime performance in terms of inference throughput and analytical performance.
No clear model winner emerges from all of our chosen metrics and a one model rules all case is doubtful in the domain of SAR ATR.
arXiv Detail & Related papers (2023-12-12T02:20:39Z) - A generic self-supervised learning (SSL) framework for representation
learning from spectra-spatial feature of unlabeled remote sensing imagery [4.397725469518669]
Self-supervised learning (SSL) enables the models to learn a representation from orders of magnitude more unlabelled data.
This work has designed a novel SSL framework that is capable of learning representation from both spectra-spatial information of unlabelled data.
arXiv Detail & Related papers (2023-06-27T23:50:43Z) - A Global Model Approach to Robust Few-Shot SAR Automatic Target
Recognition [6.260916845720537]
It may not always be possible to collect hundreds of labeled samples per class for training deep learning-based SAR Automatic Target Recognition (ATR) models.
This work specifically tackles the few-shot SAR ATR problem, where only a handful of labeled samples may be available to support the task of interest.
arXiv Detail & Related papers (2023-03-20T00:24:05Z) - Context-Preserving Instance-Level Augmentation and Deformable
Convolution Networks for SAR Ship Detection [50.53262868498824]
Shape deformation of targets in SAR image due to random orientation and partial information loss is an essential challenge in SAR ship detection.
We propose a data augmentation method to train a deep network that is robust to partial information loss within the targets.
arXiv Detail & Related papers (2022-02-14T07:01:01Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - PeaceGAN: A GAN-based Multi-Task Learning Method for SAR Target Image
Generation with a Pose Estimator and an Auxiliary Classifier [50.17500790309477]
We propose a novel GAN-based multi-task learning (MTL) method for SAR target image generation, called PeaceGAN.
PeaceGAN uses both pose angle and target class information, which makes it possible to produce SAR target images of desired target classes at intended pose angles.
arXiv Detail & Related papers (2021-03-29T10:03:09Z) - Hyperspectral Image Super-Resolution with Spectral Mixup and
Heterogeneous Datasets [99.92564298432387]
This work studies Hyperspectral image (HSI) super-resolution (SR)
HSI SR is characterized by high-dimensional data and a limited amount of training examples.
This exacerbates the undesirable behaviors of neural networks such as memorization and sensitivity to out-of-distribution samples.
arXiv Detail & Related papers (2021-01-19T12:19:53Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.