PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model
- URL: http://arxiv.org/abs/2412.12737v1
- Date: Tue, 17 Dec 2024 09:59:53 GMT
- Title: PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model
- Authors: Yuqing Wang, Zhongling Huang, Shuxin Yang, Hao Tang, Xiaolan Qiu, Junwei Han, Dingwen Zhang,
- Abstract summary: PolSAR data presents unique challenges due to its rich and complex characteristics.
Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.
Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.
We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
- Score: 76.95536611263356
- License:
- Abstract: PolSAR data presents unique challenges due to its rich and complex characteristics. Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used. However, these formats often face issues related to usability, interpretability, and data integrity. Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively. To address these issues, We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy. PolSAM introduces Microwave Vision Data (MVD), a lightweight and interpretable data representation derived from polarimetric decomposition and semantic correlations. We propose two key components: the Feature-Level Fusion Prompt (FFP), which fuses visual tokens from pseudo-colored SAR images and MVD to address modality incompatibility in the frozen SAM encoder, and the Semantic-Level Fusion Prompt (SFP), which refines sparse and dense segmentation prompts using semantic information. Experimental results on the PhySAR-Seg datasets demonstrate that PolSAM significantly outperforms existing SAM-based and multimodal fusion models, improving segmentation accuracy, reducing data storage, and accelerating inference time. The source code and datasets will be made publicly available at \url{https://github.com/XAI4SAR/PolSAM}.
Related papers
- Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival [3.4686401890974197]
We propose a new end-to-end framework, FORESEE, for robustly predicting patient survival by mining multimodal information.
Cross-fusion transformer effectively utilizes features at the cellular level, tissue level, and tumor heterogeneity level to correlate prognosis.
The hybrid attention encoder (HAE) uses the denoising contextual attention module to obtain the contextual relationship features.
We also propose an asymmetrically masked triplet masked autoencoder to reconstruct lost information within modalities.
arXiv Detail & Related papers (2024-05-13T12:39:08Z) - Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD)
It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images.
A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z) - ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment
Anything to SAR Domain for Semantic Segmentation [6.229326337093342]
Segment Anything Model (SAM) excels in various segmentation scenarios relying on semantic information and generalization ability.
The ClassWiseSAM-Adapter (CWSAM) is designed to adapt the high-performing SAM for landcover classification on space-borne Synthetic Aperture Radar (SAR) images.
CWSAM showcases enhanced performance with fewer computing resources.
arXiv Detail & Related papers (2024-01-04T15:54:45Z) - Images in Discrete Choice Modeling: Addressing Data Isomorphism in
Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning.
We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote
Sensing Image Classification [35.52272615695294]
We propose a spatial-spectral masked auto-encoder (SS-MAE) for HSI and LiDAR/SAR data joint classification.
Our SS-MAE fully exploits the spatial and spectral representations of the input data.
To complement local features in the training stage, we add two lightweight CNNs for feature extraction.
arXiv Detail & Related papers (2023-11-08T03:54:44Z) - Diffusion Models for Interferometric Satellite Aperture Radar [73.01013149014865]
Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models.
Here, we leverage PDMs to generate several radar-based satellite image datasets.
We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue.
arXiv Detail & Related papers (2023-08-31T16:26:17Z) - Input-Output Balanced Framework for Long-tailed LiDAR Semantic
Segmentation [12.639524717464509]
We propose an input-output balanced framework to handle the issue of long-tailed distribution.
For the input space, we synthesize these tailed instances from mesh models and well simulate the position and density distribution of LiDAR scan.
For the output space, a multi-head block is proposed to group different categories based on their shapes and instance amounts.
arXiv Detail & Related papers (2021-03-26T05:42:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.