Related papers: Tiny-YOLOSAM: Fast Hybrid Image Segmentation

Tiny-YOLOSAM: Fast Hybrid Image Segmentation

URL: http://arxiv.org/abs/2512.22193v1
Date: Sat, 20 Dec 2025 12:28:39 GMT
Title: Tiny-YOLOSAM: Fast Hybrid Image Segmentation
Authors: Kenneth Xu, Songhan Wu,
Abstract summary: TinySAM is a lightweight, distilled SAM variant that preserves strong zero-shot mask quality.<n>Tiny-YOLOSAM is a fast hybrid pipeline that uses a recent YOLO detector to generate box prompts for TinySAM on salient foreground objects.<n>On COCO val 2017, the hybrid system substantially improves class-agnostic coverage (AR from 16.4% to 77.1%, mIoU from 19.2% to 67.8%) while reducing end-to-end runtime from 49.20s/image to 10.39s/image (4.7x) on an Apple M1 Pro CPU.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Segment Anything Model (SAM) enables promptable, high-quality segmentation but is often too computationally expensive for latency-critical settings. TinySAM is a lightweight, distilled SAM variant that preserves strong zero-shot mask quality, yet its "segment-everything" mode still requires hundreds of prompts and remains slow in practice. We first replicate TinySAM on COCO val2017 using official checkpoints, matching the reported AP within 0.03%, establishing a reliable experimental baseline. Building on this, we propose Tiny-YOLOSAM, a fast hybrid pipeline that uses a recent YOLO detector (YOLOv12) to generate box prompts for TinySAM on salient foreground objects, and supplements uncovered regions with sparse point prompts sampled only where YOLO-guided masks provide no coverage. On COCO val2017, the hybrid system substantially improves class-agnostic coverage (AR from 16.4% to 77.1%, mIoU from 19.2% to 67.8%) while reducing end-to-end runtime from 49.20s/image to 10.39s/image (4.7x) on an Apple M1 Pro CPU. These results suggest detector-guided prompting combined with targeted sparse sampling as an effective alternative to dense "segment-everything" prompting for practical full-scene segmentation.

Related papers

Generalization vs. Specialization: Evaluating Segment Anything Model (SAM3) Zero-Shot Segmentation Against Fine-Tuned YOLO Detectors [3.5648679864643573]
This work presents a comparison between SAM3 (Segment Anything Model, also called SAMv3) operating in zero-shot mode and three variants of Ultralytics YOLO11 fine-tuned for instance segmentation.<n>YOLO exhibits steep degradation 48-50 points across IoU ranges whereas SAM3 drops only 4 points, revealing 12 times superior boundary stability of SAM3.
arXiv Detail & Related papers (2025-12-09T01:54:04Z)
SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM [25.136857576951282]
Mask-injected framework SAM-MI integrates SAM with OVSS models to address challenges.<n> SAM-MI employs a Text-guided Sparse Point Prompter to sample sparse prompts for SAM instead of previous dense grid-like prompts.<n>DMI incorporates SAM-generated masks for guidance at low-frequency and high-frequency separately, rather than directly combining them with labels.
arXiv Detail & Related papers (2025-11-25T07:52:07Z)
VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel [68.24765319399286]
We present VesSAM, a powerful and efficient framework tailored for 2D vessel segmentation.<n>VesSAM integrates (1) a convolutional adapter to enhance local texture features, (2) a multi-prompt encoder that fuses anatomical prompts, and (3) a lightweight mask decoder to reduce jagged artifacts.<n>VesSAM consistently outperforms state-of-the-art PEFT-based SAM variants by over 10% Dice and 13% IoU.
arXiv Detail & Related papers (2025-11-02T15:47:05Z)
Prompt-Tuning SAM: From Generalist to Specialist with only 2048 Parameters and 16 Training Images [48.76247995109632]
The PTSAM method uses prompt-tuning, a parameter-efficient fine-tuning technique, to adapt SAM for a specific task.<n>Our results show that prompt-tuning only SAM's mask decoder already leads to a performance on-par with state-of-the-art techniques.
arXiv Detail & Related papers (2025-04-23T14:10:02Z)
Lite-SAM Is Actually What You Need for Segment Everything [4.696541976769272]
Lite-SAM is an efficient end-to-end solution for the SegEvery task. Lite-SAM is composed of four main components: a streamlined CNN-Transformer hybrid encoder (LiteViT), an automated prompt proposal network (AutoPPN)
arXiv Detail & Related papers (2024-07-12T03:28:46Z)
TinySAM: Pushing the Envelope for Efficient Segment Anything Model [73.06322749886483]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance.<n>With all these proposed methods, our TinySAM leads to orders of magnitude computational reduction and pushes the envelope for efficient segment anything task.
arXiv Detail & Related papers (2023-12-21T12:26:11Z)
EdgeSAM: Prompt-In-the-Loop Distillation for SAM [87.52687622659904]
EdgeSAM is an accelerated variant of the Segment Anything Model (SAM)<n>Our approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture.<n>It is the first SAM variant that can run at over 30 FPS on an iPhone 14.
arXiv Detail & Related papers (2023-12-11T18:59:52Z)
Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts. This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities. Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z)
Segment Anything in High Quality [116.39405160133315]
We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability. Our careful design reuses and preserves the pre-trained model weights of SAM, while only introducing minimal additional parameters and computation. We show the efficacy of HQ-SAM in a suite of 10 diverse segmentation datasets across different downstream tasks, where 8 out of them are evaluated in a zero-shot transfer protocol.
arXiv Detail & Related papers (2023-06-02T14:23:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.