Tiny-YOLOSAM: Fast Hybrid Image Segmentation
- URL: http://arxiv.org/abs/2512.22193v1
- Date: Sat, 20 Dec 2025 12:28:39 GMT
- Title: Tiny-YOLOSAM: Fast Hybrid Image Segmentation
- Authors: Kenneth Xu, Songhan Wu,
- Abstract summary: TinySAM is a lightweight, distilled SAM variant that preserves strong zero-shot mask quality.<n>Tiny-YOLOSAM is a fast hybrid pipeline that uses a recent YOLO detector to generate box prompts for TinySAM on salient foreground objects.<n>On COCO val 2017, the hybrid system substantially improves class-agnostic coverage (AR from 16.4% to 77.1%, mIoU from 19.2% to 67.8%) while reducing end-to-end runtime from 49.20s/image to 10.39s/image (4.7x) on an Apple M1 Pro CPU.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Segment Anything Model (SAM) enables promptable, high-quality segmentation but is often too computationally expensive for latency-critical settings. TinySAM is a lightweight, distilled SAM variant that preserves strong zero-shot mask quality, yet its "segment-everything" mode still requires hundreds of prompts and remains slow in practice. We first replicate TinySAM on COCO val2017 using official checkpoints, matching the reported AP within 0.03%, establishing a reliable experimental baseline. Building on this, we propose Tiny-YOLOSAM, a fast hybrid pipeline that uses a recent YOLO detector (YOLOv12) to generate box prompts for TinySAM on salient foreground objects, and supplements uncovered regions with sparse point prompts sampled only where YOLO-guided masks provide no coverage. On COCO val2017, the hybrid system substantially improves class-agnostic coverage (AR from 16.4% to 77.1%, mIoU from 19.2% to 67.8%) while reducing end-to-end runtime from 49.20s/image to 10.39s/image (4.7x) on an Apple M1 Pro CPU. These results suggest detector-guided prompting combined with targeted sparse sampling as an effective alternative to dense "segment-everything" prompting for practical full-scene segmentation.
Related papers
- Generalization vs. Specialization: Evaluating Segment Anything Model (SAM3) Zero-Shot Segmentation Against Fine-Tuned YOLO Detectors [3.5648679864643573]
This work presents a comparison between SAM3 (Segment Anything Model, also called SAMv3) operating in zero-shot mode and three variants of Ultralytics YOLO11 fine-tuned for instance segmentation.<n>YOLO exhibits steep degradation 48-50 points across IoU ranges whereas SAM3 drops only 4 points, revealing 12 times superior boundary stability of SAM3.
arXiv Detail & Related papers (2025-12-09T01:54:04Z) - SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM [25.136857576951282]
Mask-injected framework SAM-MI integrates SAM with OVSS models to address challenges.<n> SAM-MI employs a Text-guided Sparse Point Prompter to sample sparse prompts for SAM instead of previous dense grid-like prompts.<n>DMI incorporates SAM-generated masks for guidance at low-frequency and high-frequency separately, rather than directly combining them with labels.
arXiv Detail & Related papers (2025-11-25T07:52:07Z) - VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel [68.24765319399286]
We present VesSAM, a powerful and efficient framework tailored for 2D vessel segmentation.<n>VesSAM integrates (1) a convolutional adapter to enhance local texture features, (2) a multi-prompt encoder that fuses anatomical prompts, and (3) a lightweight mask decoder to reduce jagged artifacts.<n>VesSAM consistently outperforms state-of-the-art PEFT-based SAM variants by over 10% Dice and 13% IoU.
arXiv Detail & Related papers (2025-11-02T15:47:05Z) - Prompt-Tuning SAM: From Generalist to Specialist with only 2048 Parameters and 16 Training Images [48.76247995109632]
The PTSAM method uses prompt-tuning, a parameter-efficient fine-tuning technique, to adapt SAM for a specific task.<n>Our results show that prompt-tuning only SAM's mask decoder already leads to a performance on-par with state-of-the-art techniques.
arXiv Detail & Related papers (2025-04-23T14:10:02Z) - Lite-SAM Is Actually What You Need for Segment Everything [4.696541976769272]
Lite-SAM is an efficient end-to-end solution for the SegEvery task.
Lite-SAM is composed of four main components: a streamlined CNN-Transformer hybrid encoder (LiteViT), an automated prompt proposal network (AutoPPN)
arXiv Detail & Related papers (2024-07-12T03:28:46Z) - TinySAM: Pushing the Envelope for Efficient Segment Anything Model [73.06322749886483]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance.<n>With all these proposed methods, our TinySAM leads to orders of magnitude computational reduction and pushes the envelope for efficient segment anything task.
arXiv Detail & Related papers (2023-12-21T12:26:11Z) - EdgeSAM: Prompt-In-the-Loop Distillation for SAM [87.52687622659904]
EdgeSAM is an accelerated variant of the Segment Anything Model (SAM)<n>Our approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture.<n>It is the first SAM variant that can run at over 30 FPS on an iPhone 14.
arXiv Detail & Related papers (2023-12-11T18:59:52Z) - Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts.
This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities.
Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z) - Segment Anything in High Quality [116.39405160133315]
We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability.
Our careful design reuses and preserves the pre-trained model weights of SAM, while only introducing minimal additional parameters and computation.
We show the efficacy of HQ-SAM in a suite of 10 diverse segmentation datasets across different downstream tasks, where 8 out of them are evaluated in a zero-shot transfer protocol.
arXiv Detail & Related papers (2023-06-02T14:23:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.