SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM
- URL: http://arxiv.org/abs/2511.20027v1
- Date: Tue, 25 Nov 2025 07:52:07 GMT
- Title: SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM
- Authors: Lin Chen, Yingjian Zhu, Qi Yang, Xin Niu, Kun Ding, Shiming Xiang,
- Abstract summary: Mask-injected framework SAM-MI integrates SAM with OVSS models to address challenges.<n> SAM-MI employs a Text-guided Sparse Point Prompter to sample sparse prompts for SAM instead of previous dense grid-like prompts.<n>DMI incorporates SAM-generated masks for guidance at low-frequency and high-frequency separately, rather than directly combining them with labels.
- Score: 25.136857576951282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-vocabulary semantic segmentation (OVSS) aims to segment and recognize objects universally. Trained on extensive high-quality segmentation data, the segment anything model (SAM) has demonstrated remarkable universal segmentation capabilities, offering valuable support for OVSS. Although previous methods have made progress in leveraging SAM for OVSS, there are still some challenges: (1) SAM's tendency to over-segment and (2) hard combinations between fixed masks and labels. This paper introduces a novel mask-injected framework, SAM-MI, which effectively integrates SAM with OVSS models to address these challenges. Initially, SAM-MI employs a Text-guided Sparse Point Prompter to sample sparse prompts for SAM instead of previous dense grid-like prompts, thus significantly accelerating the mask generation process. The framework then introduces Shallow Mask Aggregation (SMAgg) to merge partial masks to mitigate the SAM's over-segmentation issue. Finally, Decoupled Mask Injection (DMI) incorporates SAM-generated masks for guidance at low-frequency and high-frequency separately, rather than directly combining them with labels. Extensive experiments on multiple benchmarks validate the superiority of SAM-MI. Notably, the proposed method achieves a 16.7% relative improvement in mIoU over Grounded-SAM on the MESS benchmark, along with a 1.6$\times$ speedup. We hope SAM-MI can serve as an alternative methodology to effectively equip the OVSS model with SAM.
Related papers
- Evaluating SAM2 for Video Semantic Segmentation [60.157605818225186]
The Anything Model 2 (SAM2) has proven to be a powerful foundation model for promptable visual object segmentation in both images and videos.<n>This paper explores the extension of SAM2 to dense Video Semantic (VSS)<n>Our experiments suggest that leveraging SAM2 enhances overall performance in VSS, primarily due to its precise predictions of object boundaries.
arXiv Detail & Related papers (2025-12-01T15:15:16Z) - SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement [40.37217744643069]
We propose a universal and efficient approach by adapting SAM to the mask refinement task.<n>Specifically, we introduce a multi-prompt excavation strategy to mine diverse input prompts for SAM.<n>We extend our method to SAMRefiner++ by introducing an additional IoU adaption step to further boost the performance of the generic SAMRefiner on the target dataset.
arXiv Detail & Related papers (2025-02-10T18:33:15Z) - Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection [58.241593208031816]
Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities.
We propose a Multi-scale and Detail-enhanced SAM (MDSAM) for Salient Object Detection (SOD)
Experimental results demonstrate the superior performance of our model on multiple SOD datasets.
arXiv Detail & Related papers (2024-08-08T09:09:37Z) - SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation [88.80792308991867]
Segment Anything model (SAM) has shown ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges.<n>This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation.<n> Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains.
arXiv Detail & Related papers (2024-07-23T17:47:25Z) - Segment Anything without Supervision [65.93211374889196]
We present Unsupervised SAM (UnSAM) for promptable and automatic whole-image segmentation.
UnSAM utilizes a divide-and-conquer strategy to "discover" the hierarchical structure of visual scenes.
We show that supervised SAM can also benefit from our self-supervised labels.
arXiv Detail & Related papers (2024-06-28T17:47:32Z) - MaskSAM: Towards Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation [17.25946659884426]
We propose MaskSAM, a mask classification prompt-free framework for medical image segmentation.<n>Our method achieves state-of-the-art performance on AMOS2022, 90.52% Dice, which improved by 2.7% compared to nnUNet.
arXiv Detail & Related papers (2024-03-21T03:28:24Z) - WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images [8.179859593451285]
We present WSI-SAM, enhancing Segment Anything Model (SAM) with precise object segmentation capabilities for histopathology images.
To fully exploit pretrained knowledge while minimizing training overhead, we keep SAM frozen, introducing only minimal extra parameters.
Our model outperforms SAM by 4.1 and 2.5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task.
arXiv Detail & Related papers (2024-03-14T10:30:43Z) - PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation [19.65118388712439]
We introduce a novel prompt-driven adapter into SAM, namely Prompt Adapter Segment Anything Model (PA-SAM)
By exclusively training the prompt adapter, PA-SAM extracts detailed information from images and optimize the mask decoder feature at both sparse and dense prompt levels.
Experimental results demonstrate that our PA-SAM outperforms other SAM-based methods in high-quality, zero-shot, and open-set segmentation.
arXiv Detail & Related papers (2024-01-23T19:20:22Z) - Compositional Oil Spill Detection Based on Object Detector and Adapted Segment Anything Model from SAR Images [3.2843040151689586]
We propose a composite oil spill detection framework, SAM-OIL, comprising an object detector (e.g., YOLOv8), an Adapted Segment Anything Model (SAM) and an Ordered Mask Fusion (OMF) module.<n> SAM-OIL is the first application of the powerful SAM in oil spill detection. Specifically, the SAM-OIL strategy uses YOLOv8 to obtain the categories and bounding boxes of oil spill-related objects, then inputs bounding boxes into the Adapted SAM to retrieve category-agnostic masks, and finally adopts the OMF module to fuse the masks and categories.
arXiv Detail & Related papers (2024-01-15T07:03:10Z) - Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts.
This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities.
Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.