Related papers: SPDA-SAM: A Self-prompted Depth-Aware Segment Anything Model for Instance Segmentation

SPDA-SAM: A Self-prompted Depth-Aware Segment Anything Model for Instance Segmentation

URL: http://arxiv.org/abs/2602.06335v1
Date: Fri, 06 Feb 2026 03:01:41 GMT
Title: SPDA-SAM: A Self-prompted Depth-Aware Segment Anything Model for Instance Segmentation
Authors: Yihan Shang, Wei Wang, Chao Huang, Xinghui Dong,
Abstract summary: We propose a Self-prompted Depth-Aware SAM (SPDA-SAM) for instance segmentation.<n> Specifically, we design a Semantic-Spatial Self-prompt Module (SSSPM) which extracts the semantic and spatial prompts from the image encoder and the mask decoder of SAM.<n>We also introduce a Coarse-to-Fine RGB-D Fusion Module (C2FFM) in which the features extracted from a monocular RGB image and the depth map estimated from it are fused.
Score: 12.878470455789945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Segment Anything Model (SAM) has demonstrated strong generalizability in various instance segmentation tasks. However, its performance is severely dependent on the quality of manual prompts. In addition, the RGB images that instance segmentation methods normally use inherently lack depth information. As a result, the ability of these methods to perceive spatial structures and delineate object boundaries is hindered. To address these challenges, we propose a Self-prompted Depth-Aware SAM (SPDA-SAM) for instance segmentation. Specifically, we design a Semantic-Spatial Self-prompt Module (SSSPM) which extracts the semantic and spatial prompts from the image encoder and the mask decoder of SAM, respectively. Furthermore, we introduce a Coarse-to-Fine RGB-D Fusion Module (C2FFM), in which the features extracted from a monocular RGB image and the depth map estimated from it are fused. In particular, the structural information in the depth map is used to provide coarse-grained guidance to feature fusion, while local variations in depth are encoded in order to fuse fine-grained feature representations. To our knowledge, SAM has not been explored in such self-prompted and depth-aware manners. Experimental results demonstrate that our SPDA-SAM outperforms its state-of-the-art counterparts across twelve different data sets. These promising results should be due to the guidance of the self-prompts and the compensation for the spatial information loss by the coarse-to-fine RGB-D fusion operation.

Related papers

HyPSAM: Hybrid Prompt-driven Segment Anything Model for RGB-Thermal Salient Object Detection [75.406055413928]
We propose a novel prompt-driven segment anything model (HyPSAM) for RGB-T SOD.<n> DFNet employs dynamic convolution and multi-branch decoding to facilitate adaptive cross-modality interaction.<n>Experiments on three public datasets demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-09-23T07:32:11Z)
Segment Any RGB-Thermal Model with Language-aided Distillation [17.837670087342456]
We propose a novel framework, SARTM, which customizes the powerful SAM for RGB-T semantic segmentation.<n>Our key idea is to unleash the potential of SAM while introduce semantic understanding modules for RGB-T data pairs.<n>Both quantitative and qualitative results consistently demonstrate that the proposed SARTM significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-05-04T00:24:17Z)
RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory [34.406308400305385]
RGB-Depth (RGB-D) Video Object (VOS) aims to integrate the fine-grained texture information of RGB with the geometric clues of depth modality.<n>In this paper, we propose a novel RGB-D VOS via multi-store feature memory for robust segmentation.<n>We show that the proposed method state-of-the-art performance on the latest RGB-D VOS benchmark.
arXiv Detail & Related papers (2025-04-23T07:31:37Z)
PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model [83.35198885088093]
PolSAR data presents unique challenges due to its rich and complex characteristics.<n>Existing data representations, such as complex-valued data, polarimetric features, and amplitude images, are widely used.<n>Most feature extraction networks for PolSAR are small, limiting their ability to capture features effectively.<n>We propose the Polarimetric Scattering Mechanism-Informed SAM (PolSAM), an enhanced Segment Anything Model (SAM) that integrates domain-specific scattering characteristics and a novel prompt generation strategy.
arXiv Detail & Related papers (2024-12-17T09:59:53Z)
SSFam: Scribble Supervised Salient Object Detection Family [13.369217449092524]
Scribble supervised salient object detection (SSSOD) constructs segmentation ability of attractive objects from surroundings under the supervision of sparse scribble labels. For the better segmentation, depth and thermal infrared modalities serve as the supplement to RGB images in the complex scenes. Our model demonstrates the remarkable performance among combinations of different modalities and refreshes the highest level of scribble supervised methods.
arXiv Detail & Related papers (2024-09-07T13:07:59Z)
Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection [58.241593208031816]
Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. We propose a Multi-scale and Detail-enhanced SAM (MDSAM) for Salient Object Detection (SOD) Experimental results demonstrate the superior performance of our model on multiple SOD datasets.
arXiv Detail & Related papers (2024-08-08T09:09:37Z)
Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection [22.027032083786242]
DSAM exploits the zero-shot capability of SAM to realize precise segmentation in the RGB-D domain. The Finer Module explores the possibility of accurately segmenting highly camouflaged targets from a depth perspective.
arXiv Detail & Related papers (2024-07-17T06:31:29Z)
Depth-Guided Semi-Supervised Instance Segmentation [62.80063539262021]
Semi-Supervised Instance (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. We introduce a Depth-Guided (DG) framework to overcome this limitation.
arXiv Detail & Related papers (2024-06-25T09:36:50Z)
MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation. Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z)
RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model [29.42043345787285]
We propose a method to learn the generation of appropriate prompts for Segment Anything Model (SAM) This enables SAM to produce semantically discernible segmentation results for remote sensing images. We also propose several ongoing derivatives for instance segmentation tasks, drawing on recent advancements within the SAM community, and compare their performance with RSPrompter.
arXiv Detail & Related papers (2023-06-28T14:51:34Z)
SAD: Segment Any RGBD [54.24917975958583]
The Segment Anything Model (SAM) has demonstrated its effectiveness in segmenting any part of 2D RGB images. We propose the Segment Any RGBD (SAD) model, which is specifically designed to extract geometry information directly from images.
arXiv Detail & Related papers (2023-05-23T16:26:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.