Related papers: PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model

PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model

URL: http://arxiv.org/abs/2412.12737v2
Date: Tue, 30 Sep 2025 07:27:49 GMT
Title: PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model
Authors: Yuqing Wang, Zhongling Huang, Shuxin Yang, Hao Tang, Xiaolan Qiu, Junwei Han, Dingwen Zhang,
Abstract summary: PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
Score: 83.35198885088093
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: PolSAR data presents unique challenges due to its rich and complex characteristics. Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used. However, these formats often face issues related to usability, interpretability, and data integrity. Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively. To address these issues, We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy. PolSAM introduces Microwave Vision Data (MVD), a lightweight and interpretable data representation derived from polarimetric decomposition and semantic correlations. We propose two key components: the Feature-Level Fusion Prompt (FFP), which fuses visual tokens from pseudo-colored SAR images and MVD to address modality incompatibility in the frozen SAM encoder, and the Semantic-Level Fusion Prompt (SFP), which refines sparse and dense segmentation prompts using semantic information. Experimental results on the PhySAR-Seg datasets demonstrate that PolSAM significantly outperforms existing SAM-based and multimodal fusion models, improving segmentation accuracy, reducing data storage, and accelerating inference time. The source code and datasets will be made publicly available at https://github.com/XAI4SAR/PolSAM.

Related papers

COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data -- Generation Stochastic by Design [9.278432103577925]
Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover products.<n>We introduce COP-GEN, a latent diffusion transformer that models the joint distribution of heterogeneous Earth Observation modalities at their native spatial resolutions.<n>Experiments on a large-scale global multimodal dataset show that COP-GEN generates diverse yet physically consistent realisations while maintaining strong peak fidelity across optical, radar, and elevation modalities.
arXiv Detail & Related papers (2026-03-03T18:31:46Z)
M2I2HA: Multi-modal Object Detection Based on Intra- and Inter-Modal Hypergraph Attention [5.485819352754784]
We propose a multi-modal perception network based on hypergraph theory called M2I2HA.<n>Our architecture includes an Intra-Hypergraph Enhancement module to capture global many-to-many high-order relationships within each modality.<n>An Inter-Hypergraph Fusion module to align, enhance, and fuse cross-modal features by bridging configuration and spatial gaps between data sources.
arXiv Detail & Related papers (2026-01-21T08:55:07Z)
FSMODNet: A Closer Look at Few-Shot Detection in Multispectral Data [7.459632891054827]
Few-shot multispectral object detection (FSMOD) addresses the challenge of detecting objects across visible and thermal modalities with minimal data.<n>We introduce a framework named "FSMODNet" that leverages cross-modality feature integration to improve detection performance even with limited labels.
arXiv Detail & Related papers (2025-09-25T08:45:05Z)
HyPSAM: Hybrid Prompt-driven Segment Anything Model for RGB-Thermal Salient Object Detection [75.406055413928]
We propose a novel prompt-driven segment anything model (HyPSAM) for RGB-T SOD.<n> DFNet employs dynamic convolution and multi-branch decoding to facilitate adaptive cross-modality interaction.<n>Experiments on three public datasets demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-09-23T07:32:11Z)
Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios [65.97836905826145]
scarcity of data in various scenarios, such as medical, industry and autonomous driving, leads to model overfitting and dataset imbalance.<n>We propose Crucial-Diff, a domain-agnostic framework designed to synthesize crucial samples.<n>Our framework generates diverse, high-quality training data, achieving a pixel-level AP of 83.63% and an F1-MAX of 78.12% on MVTec.
arXiv Detail & Related papers (2025-07-14T04:41:38Z)
Knowledge-guided Complex Diffusion Model for PolSAR Image Classification in Contourlet Domain [58.46450049579116]
We propose a knowledge-guided complex diffusion model for PolSAR image classification in the Contourlet domain.<n> Specifically, the Contourlet transform is first applied to decompose the data into low- and high-frequency subbands.<n>A knowledge-guided complex diffusion network is then designed to model the statistical properties of the low-frequency components.
arXiv Detail & Related papers (2025-07-08T04:50:28Z)
Cross-Sequence Semi-Supervised Learning for Multi-Parametric MRI-Based Visual Pathway Delineation [18.101169568060786]
We propose a novel semi-supervised multi-parametric feature decomposition framework for VP delineation.<n>Specifically, a correlation-constrained feature decomposition (CFD) is designed to handle the complex cross-sequence relationships.<n>We validate our framework using two public datasets, and one in-house Multi-Shell Diffusion MRI (MDM) dataset.
arXiv Detail & Related papers (2025-05-26T09:18:58Z)
UrbanSAM: Learning Invariance-Inspired Adapters for Segment Anything Models in Urban Construction [51.54946346023673]
Urban morphology is inherently complex, with irregular objects of diverse shapes and varying scales. The Segment Anything Model (SAM) has shown significant potential in segmenting complex scenes. We propose UrbanSAM, a customized version of SAM specifically designed to analyze complex urban environments.
arXiv Detail & Related papers (2025-02-21T04:25:19Z)
Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments. We propose UOIS-SAM, a data-efficient solution for the UOIS task. UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z)
FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival [3.4686401890974197]
We propose a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information. Cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis. The hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features. We also propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities.
arXiv Detail & Related papers (2024-05-13T12:39:08Z)
Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD) It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z)
A SAM-guided Two-stream Lightweight Model for Anomaly Detection [44.73985145110819]
We propose a SAM-guided Two-stream Lightweight Model for unsupervised anomaly detection (STLM) Our experiments conducted on MVTec AD benchmark show that STLM, with about 16M parameters and achieving an inference time in 20ms, competes effectively with state-of-the-art methods.
arXiv Detail & Related papers (2024-02-29T13:29:10Z)
ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation [6.229326337093342]
Segment Anything Model (SAM) excels in various segmentation scenarios relying on semantic information and generalization ability. The ClassWiseSAM-Adapter (CWSAM) is designed to adapt the high-performing SAM for landcover classification on space-borne Synthetic Aperture Radar (SAR) images. CWSAM showcases enhanced performance with fewer computing resources.
arXiv Detail & Related papers (2024-01-04T15:54:45Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification [35.52272615695294]
We propose a spatial-spectral masked auto-encoder (SS-MAE) for HSI and LiDAR/SAR data joint classification. Our SS-MAE fully exploits the spatial and spectral representations of the input data. To complement local features in the training stage, we add two lightweight CNNs for feature extraction.
arXiv Detail & Related papers (2023-11-08T03:54:44Z)
Diffusion Models for Interferometric Satellite Aperture Radar [73.01013149014865]
Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models. Here, we leverage PDMs to generate several radar-based satellite image datasets. We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue.
arXiv Detail & Related papers (2023-08-31T16:26:17Z)
Input-Output Balanced Framework for Long-tailed LiDAR Semantic Segmentation [12.639524717464509]
We propose an input-output balanced framework to handle the issue of long-tailed distribution. For the input space, we synthesize these tailed instances from mesh models and well simulate the position and density distribution of LiDAR scan. For the output space, a multi-head block is proposed to group different categories based on their shapes and instance amounts.
arXiv Detail & Related papers (2021-03-26T05:42:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.