Related papers: GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models

GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models

URL: http://arxiv.org/abs/2506.23903v1
Date: Mon, 30 Jun 2025 14:33:44 GMT
Title: GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models
Authors: Hamza Rasaee, Taha Koleilat, Hassan Rivaz,
Abstract summary: We propose a prompt-driven vision-language model (VLM) that integrates Grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs.<n>A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized.<n>Our approach outperforms state-of-the-art segmentation methods, including UniverSeg, MedSAM, MedCLIP-SAM, BiomedParse, and SAMUS.
Score: 2.089191490381739
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision-language model (VLM) that integrates Grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of Grounding DINO using Low Rank Adaptation (LoRA) to the ultrasound domain, and 3 were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art segmentation methods, including UniverSeg, MedSAM, MedCLIP-SAM, BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on code.sonography.ai after acceptance.

Related papers

Benchmarking of Deep Learning Methods for Generic MRI Multi-Organ Abdominal Segmentation [0.5277756703318045]
We present a benchmarking of three state-of-the-art and open-source MRI abdominal segmentation tools.<n>These tools are MRSegmentator, MRISegmentator-Abdomen, and TotalSegmentator MRI.<n>We also evaluate ABD Synth, a SynthSeg-based model purely trained on widely available CT segmentations.
arXiv Detail & Related papers (2025-07-23T22:37:26Z)
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z)
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism. We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z)
WATUNet: A Deep Neural Network for Segmentation of Volumetric Sweep Imaging Ultrasound [1.2903292694072621]
Volume sweep imaging (VSI) is an innovative approach that enables untrained operators to capture quality ultrasound images. We present a novel segmentation model known as Wavelet_Attention_UNet (WATUNet) In this model, we incorporate wavelet gates (WGs) and attention gates (AGs) between the encoder and decoder instead of a simple connection to overcome the limitations mentioned.
arXiv Detail & Related papers (2023-11-17T20:32:37Z)
SonoSAMTrack -- Segment and Track Anything on Ultrasound Images [8.19114188484929]
SonoSAMTrack combines a promptable foundational model for segmenting objects of interest on ultrasound images called SonoSAM. SonoSAM demonstrates state-of-the-art performance on 7 unseen data-sets, outperforming competing methods by a significant margin.
arXiv Detail & Related papers (2023-10-25T16:42:26Z)
MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z)
Segment Anything Model for Medical Image Analysis: an Experimental Study [19.95972201734614]
Segment Anything Model (SAM) is a foundation model that is intended to segment user-defined objects of interest in an interactive manner. We evaluate SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies.
arXiv Detail & Related papers (2023-04-20T17:50:18Z)
AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation [32.938687630678096]
AMOS is a large-scale, diverse, clinical dataset for abdominal organ segmentation. It provides challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset.
arXiv Detail & Related papers (2022-06-16T09:27:56Z)
Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation [35.33425093398756]
Unlabeled data is much easier to acquire than well-annotated data. We propose uncertainty-aware multi-view co-training for medical image segmentation. Our framework is capable of efficiently utilizing unlabeled data for better performance.
arXiv Detail & Related papers (2020-06-28T22:04:54Z)
MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data [75.73881040581767]
We propose a novel multi-site network (MS-Net) for improving prostate segmentation by learning robust representations. Our MS-Net improves the performance across all datasets consistently, and outperforms state-of-the-art methods for multi-site learning.
arXiv Detail & Related papers (2020-02-09T14:11:50Z)
Deep Attentive Features for Prostate Segmentation in 3D Transrectal Ultrasound [59.105304755899034]
This paper develops a novel 3D deep neural network equipped with attention modules for better prostate segmentation in transrectal ultrasound (TRUS) images. Our attention module utilizes the attention mechanism to selectively leverage the multilevel features integrated from different layers. Experimental results on challenging 3D TRUS volumes show that our method attains satisfactory segmentation performance.
arXiv Detail & Related papers (2019-07-03T05:21:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.