Related papers: Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

URL: http://arxiv.org/abs/2506.10825v1
Date: Thu, 12 Jun 2025 15:44:49 GMT
Title: Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches
Authors: Andrea Moglia, Matteo Leccardi, Matteo Cavicchioli, Alice Maccarini, Marco Marcon, Luca Mainardi, Pietro Cerveri,
Abstract summary: We offer a comprehensive and in-depth investigation on generalist models for medical image segmentation.<n>We provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2.<n>We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI)
Score: 1.2366904002994854
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models for medical image segmentation. We start with an introduction on the fundamentals concepts underpinning their development. Then, we provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI and physical AI, and clinical translation.

Related papers

Vision Generalist Model: A Survey [87.49797517847132]
We provide a comprehensive overview of the vision generalist models, delving into their characteristics and capabilities within the field.<n>We take a brief excursion into related domains, shedding light on their interconnections and potential synergies.
arXiv Detail & Related papers (2025-06-11T17:23:41Z)
How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model [12.051904886550956]
This work summarizes existing fine-tuning strategies with various backbone architectures, model components, and fine-tuning algorithms across 18 combinations.<n>We evaluate them on 17 datasets covering all common radiology modalities.<n>We release our code and MRI-specific fine-tuned weights, which consistently obtained superior performance over the original SAM.
arXiv Detail & Related papers (2024-04-15T17:31:32Z)
One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts [62.55349777609194]
We construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies.<n>We build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans.
arXiv Detail & Related papers (2023-12-28T18:16:00Z)
Foundational Models in Medical Imaging: A Comprehensive Survey and Future Vision [6.2847894163744105]
Foundation models are large-scale, pre-trained deep-learning models adapted to a wide range of downstream tasks. These models facilitate contextual reasoning, generalization, and prompt capabilities at test time. Capitalizing on the advances in computer vision, medical imaging has also marked a growing interest in these models.
arXiv Detail & Related papers (2023-10-28T12:08:12Z)
Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time. The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z)
Empirical Analysis of a Segmentation Foundation Model in Prostate Imaging [9.99042549094606]
We consider a recently developed foundation model for medical image segmentation, UniverSeg. We conduct an empirical evaluation study in the context of prostate imaging and compare it against the conventional approach of training a task-specific segmentation model.
arXiv Detail & Related papers (2023-07-06T20:00:52Z)
Artificial General Intelligence for Medical Imaging Analysis [92.3940918983821]
Large-scale Artificial General Intelligence (AGI) models have achieved unprecedented success in a variety of general domain tasks. These models face notable challenges arising from the medical field's inherent complexities and unique characteristics. This review aims to offer insights into the future implications of AGI in medical imaging, healthcare, and beyond.
arXiv Detail & Related papers (2023-06-08T18:04:13Z)
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC) UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z)
A Comprehensive Survey on Segment Anything Model for Vision and Beyond [7.920790211915402]
It is urgent to design a general class of models, which we term foundation models, trained on broad data. The recently proposed segment anything model (SAM) has made significant progress in breaking the boundaries of segmentation. This paper introduces the background and terminology for foundation models including SAM, as well as state-of-the-art methods contemporaneous with SAM.
arXiv Detail & Related papers (2023-05-14T16:23:22Z)
Towards Segment Anything Model (SAM) for Medical Image Segmentation: A Survey [8.76496233192512]
We discuss efforts to extend the success of the Segment Anything Model to medical image segmentation tasks. Many insights are drawn to guide future research to develop foundation models for medical image analysis.
arXiv Detail & Related papers (2023-05-05T16:48:45Z)
Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. We explicitly account for prior images and reports when available during both training and fine-tuning. Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z)
Improving Generation and Evaluation of Visual Stories via Semantic Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions. Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task. We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.