Learning Object-Centric Representations in SAR Images with Multi-Level Feature Fusion
- URL: http://arxiv.org/abs/2509.09298v1
- Date: Thu, 11 Sep 2025 09:42:32 GMT
- Title: Learning Object-Centric Representations in SAR Images with Multi-Level Feature Fusion
- Authors: Oh-Tae Jang, Min-Gon Cho, Kyung-Tae Kim,
- Abstract summary: SlotSAR disentangles target representations from background clutter in SAR images without mask annotations.<n>We present a multi-level slot attention module that integrates these low- and high-level features to enhance slot-wise representation distinctiveness.
- Score: 5.295349411568878
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthetic aperture radar (SAR) images contain not only targets of interest but also complex background clutter, including terrain reflections and speckle noise. In many cases, such clutter exhibits intensity and patterns that resemble targets, leading models to extract entangled or spurious features. Such behavior undermines the ability to form clear target representations, regardless of the classifier. To address this challenge, we propose a novel object-centric learning (OCL) framework, named SlotSAR, that disentangles target representations from background clutter in SAR images without mask annotations. SlotSAR first extracts high-level semantic features from SARATR-X and low-level scattering features from the wavelet scattering network in order to obtain complementary multi-level representations for robust target characterization. We further present a multi-level slot attention module that integrates these low- and high-level features to enhance slot-wise representation distinctiveness, enabling effective OCL. Experimental results demonstrate that SlotSAR achieves state-of-the-art performance in SAR imagery by preserving structural details compared to existing OCL methods.
Related papers
- SARMAE: Masked Autoencoder for SAR Representation Learning [17.36199520462285]
We propose SARMAE, a Noise-Aware Masked Autoencoder for self-supervised SAR representation learning.<n>SARMAE injects SAR-specific speckle noise into masked autoencoders to facilitate noise-aware and robust representation learning.<n>Experiments across multiple SAR datasets demonstrate that SARMAE achieves state-of-the-art performance on classification, detection, and segmentation tasks.
arXiv Detail & Related papers (2025-12-18T15:10:19Z) - Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [51.74614065919118]
This paper introduces SegEarth-OV, the first framework for annotation-free open-vocabulary segmentation of RS images.<n>We propose SimFeatUp, a universal upsampler that robustly restores high-resolution spatial details from coarse features.<n>We also present a simple yet effective Global Bias Alleviation operation to subtract the inherent global context from patch features.
arXiv Detail & Related papers (2025-08-25T14:22:57Z) - Collaborative Learning of Scattering and Deep Features for SAR Target Recognition with Noisy Labels [7.324728751991982]
We propose collaborative learning of scattering and deep features (DF) for SAR automatic target recognition with noisy labels.<n>Specifically, a multi-model feature fusion framework is designed to integrate scattering and deep features.<n>The proposed method can achieve state-of-the-art performance under different operating conditions with various label noises.
arXiv Detail & Related papers (2025-08-11T06:10:23Z) - SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling [41.24071764578782]
Object detection in satellite-borne Synthetic Aperture Radar imagery holds immense potential in tasks such as urban monitoring and disaster response.<n>The detection of small objects in satellite-borne SAR images poses a particularly intricate problem, because of the technology's relatively low spatial resolution and inherent noise.<n>In this paper, we introduce TRANSAR, a novel self-supervised end-to-end vision transformer-based SAR object detection model.
arXiv Detail & Related papers (2025-04-17T19:44:05Z) - Bottom-Up Scattering Information Perception Network for SAR target recognition [9.694730272245849]
This paper proposes a novel bottom-up scattering information perception network for more interpretable target recognition.<n>First, the localized scattering perceptron is proposed to replace the backbone feature extractor based on CNN networks.<n>Second, an unsupervised scattering part feature extraction model is proposed to robustly characterize the target scattering part information.<n>Third, by aggregating the knowledge of target parts to form the complete target description, the interpretability and discriminative ability of the model is improved.
arXiv Detail & Related papers (2025-04-07T07:15:08Z) - Multitask Learning for SAR Ship Detection with Gaussian-Mask Joint Segmentation [20.540873039361102]
This paper proposes a multitask learning framework for SAR ship detection, consisting of object detection, speckle suppression, and target segmentation tasks.
An angle classification loss with aspect ratio weighting is introduced to improve detection accuracy by addressing angular periodicity and object proportions.
The speckle suppression task uses a dual-feature fusion attention mechanism to reduce noise and fuse shallow and denoising features, enhancing robustness.
The target segmentation task, leveraging a rotated Gaussian-mask, aids the network in extracting target regions from cluttered backgrounds and improves detection efficiency with pixel-level predictions.
arXiv Detail & Related papers (2024-11-21T05:10:41Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - CiaoSR: Continuous Implicit Attention-in-Attention Network for
Arbitrary-Scale Image Super-Resolution [158.2282163651066]
This paper proposes a continuous implicit attention-in-attention network, called CiaoSR.
We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features.
We embed a scale-aware attention in this implicit attention network to exploit additional non-local information.
arXiv Detail & Related papers (2022-12-08T15:57:46Z) - Learning Efficient Representations for Enhanced Object Detection on
Large-scene SAR Images [16.602738933183865]
It is a challenging problem to detect and recognize targets on complex large-scene Synthetic Aperture Radar (SAR) images.
Recently developed deep learning algorithms can automatically learn the intrinsic features of SAR images.
We propose an efficient and robust deep learning based target detection method.
arXiv Detail & Related papers (2022-01-22T03:25:24Z) - RRNet: Relational Reasoning Network with Parallel Multi-scale Attention
for Salient Object Detection in Optical Remote Sensing Images [82.1679766706423]
Salient object detection (SOD) for optical remote sensing images (RSIs) aims at locating and extracting visually distinctive objects/regions from the optical RSIs.
We propose a relational reasoning network with parallel multi-scale attention for SOD in optical RSIs.
Our proposed RRNet outperforms the existing state-of-the-art SOD competitors both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-10-27T07:18:32Z) - PeaceGAN: A GAN-based Multi-Task Learning Method for SAR Target Image
Generation with a Pose Estimator and an Auxiliary Classifier [50.17500790309477]
We propose a novel GAN-based multi-task learning (MTL) method for SAR target image generation, called PeaceGAN.
PeaceGAN uses both pose angle and target class information, which makes it possible to produce SAR target images of desired target classes at intended pose angles.
arXiv Detail & Related papers (2021-03-29T10:03:09Z) - Dense Attention Fluid Network for Salient Object Detection in Optical
Remote Sensing Images [193.77450545067967]
We propose an end-to-end Dense Attention Fluid Network (DAFNet) for salient object detection in optical remote sensing images (RSIs)
A Global Context-aware Attention (GCA) module is proposed to adaptively capture long-range semantic context relationships.
We construct a new and challenging optical RSI dataset for SOD that contains 2,000 images with pixel-wise saliency annotations.
arXiv Detail & Related papers (2020-11-26T06:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.