Generalization vs. Specialization: Evaluating Segment Anything Model (SAM3) Zero-Shot Segmentation Against Fine-Tuned YOLO Detectors
- URL: http://arxiv.org/abs/2512.11884v1
- Date: Tue, 09 Dec 2025 01:54:04 GMT
- Title: Generalization vs. Specialization: Evaluating Segment Anything Model (SAM3) Zero-Shot Segmentation Against Fine-Tuned YOLO Detectors
- Authors: Ranjan Sapkota, Konstantinos I. Roumeliotis, Manoj Karkee, Nikolaos D. Tselikas,
- Abstract summary: This work presents a comparison between SAM3 (Segment Anything Model, also called SAMv3) operating in zero-shot mode and three variants of Ultralytics YOLO11 fine-tuned for instance segmentation.<n>YOLO exhibits steep degradation 48-50 points across IoU ranges whereas SAM3 drops only 4 points, revealing 12 times superior boundary stability of SAM3.
- Score: 3.5648679864643573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has advanced two fundamentally different paradigms for instance segmentation: specialized models optimized through task-specific fine-tuning and generalist foundation models capable of zero-shot segmentation. This work presents a comprehensive comparison between SAM3 (Segment Anything Model, also called SAMv3) operating in zero-shot mode and three variants of Ultralytics YOLO11 (nano, medium, and large) fine-tuned for instance segmentation. The evaluation is conducted on the MinneApple dataset, a dense benchmark comprising 670 orchard images with 28,179 annotated apple instances, enabling rigorous validation of model behavior under high object density and occlusion. Our analysis shows IoU choices can inflate performance gaps by up to 30%. At the appropriate IoU = 0.15 threshold, YOLO models achieve 68.9%, 72.2%, and 71.9% F1, while SAM3 reaches 59.8% in pure zero-shot mode. However, YOLO exhibits steep degradation 48-50 points across IoU ranges whereas SAM3 drops only 4 points, revealing 12 times superior boundary stability of SAM3. This highlights the strength of SAMv3 in mask precision versus specialization in detection completeness of YOLO11. We provide open-source code, evaluation pipelines, and methodological recommendations, contributing to a deeper understanding of when specialized fine-tuned models or generalist foundation models are preferable for dense instance segmentation tasks. This project repository is available on GitHub as https://github.com/Applied-AI-Research-Lab/Segment-Anything-Model-SAM3-Zero-Shot-Segmentation-Agains t-Fine-Tuned-YOLO-Detectors
Related papers
- Tiny-YOLOSAM: Fast Hybrid Image Segmentation [0.0]
TinySAM is a lightweight, distilled SAM variant that preserves strong zero-shot mask quality.<n>Tiny-YOLOSAM is a fast hybrid pipeline that uses a recent YOLO detector to generate box prompts for TinySAM on salient foreground objects.<n>On COCO val 2017, the hybrid system substantially improves class-agnostic coverage (AR from 16.4% to 77.1%, mIoU from 19.2% to 67.8%) while reducing end-to-end runtime from 49.20s/image to 10.39s/image (4.7x) on an Apple M1 Pro CPU.
arXiv Detail & Related papers (2025-12-20T12:28:39Z) - AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception [3.298091299319354]
The Autonomous Driving Segment Anything Model (AD-SAM) is a fine-tuned vision foundation model for semantic segmentation in autonomous driving (AD)<n>AD-SAM extends the Segment Anything Model (SAM) with a dual-encoder and deformable decoder tailored to spatial and geometric complexity of road scenes.<n> Experiments show that AD-SAM surpasses SAM, Generalized SAM (G-SAM), and a deep learning baseline (DeepLabV3) in segmentation accuracy.
arXiv Detail & Related papers (2025-10-30T23:30:33Z) - YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception [58.06752127687312]
We propose YOLOv13, an accurate and lightweight object detector.<n>We propose a Hypergraph-based Adaptive Correlation Enhancement (HyperACE) mechanism.<n>We also propose a Full-Pipeline Aggregation-and-Distribution (FullPAD) paradigm.
arXiv Detail & Related papers (2025-06-21T15:15:03Z) - S*: Test Time Scaling for Code Generation [55.11863577956177]
We propose S*, the first hybrid test-time scaling framework for code generation.<n>S* substantially improves the coverage and selection accuracy of generated code.
arXiv Detail & Related papers (2025-02-20T09:18:53Z) - Looking Locally: Object-Centric Vision Transformers as Foundation Models for Efficient Segmentation [3.5297361401370053]
Current state-of-the-art segmentation models encode entire images before focusing on specific objects.<n>We introduce FLIP, a parameter-efficient vision model that realizes object segmentation through biologically-inspired top-down attention.<n>We show that FLIP segments even very small objects accurately, where existing models fail severely.
arXiv Detail & Related papers (2025-02-04T23:07:34Z) - Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development [0.36868085124383626]
This study presents a novel method for deep learning-based instance segmentation of apples in commercial orchards.<n>We synthetically generated orchard images and automatically annotated them using the Segment Anything Model (SAM) integrated with a YOLO11 base model.<n>The results showed that the automatically generated annotations achieved a Dice Coefficient of 0.9513 and an IoU of 0.9303, validating the accuracy and overlap of the mask annotations.
arXiv Detail & Related papers (2024-11-18T05:11:29Z) - Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - TinySAM: Pushing the Envelope for Efficient Segment Anything Model [73.06322749886483]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance.<n>With all these proposed methods, our TinySAM leads to orders of magnitude computational reduction and pushes the envelope for efficient segment anything task.
arXiv Detail & Related papers (2023-12-21T12:26:11Z) - Zero-Shot Refinement of Buildings' Segmentation Models using SAM [6.110856077714895]
We present a novel approach to adapt foundation models to address existing models' generalization dropback.
Among several models, our focus centers on the Segment Anything Model (SAM)
SAM does not offer recognition abilities and thus fails to classify and tag localized objects.
This novel approach augments SAM with recognition abilities, a first of its kind.
arXiv Detail & Related papers (2023-10-03T07:19:59Z) - YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [63.36722419180875]
We provide an efficient and performant object detector, termed YOLO-MS.<n>We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets.<n>Our work can also serve as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z) - Enhancing Geometric Factors in Model Learning and Inference for Object
Detection and Instance Segmentation [91.12575065731883]
We propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS)
The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $ell_n$-norm loss and IoU-based loss.
Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR.
arXiv Detail & Related papers (2020-05-07T16:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.