Related papers: Towards Better Ultrasound Video Segmentation Foundation Model: An Empirical study on SAM2 Finetuning from Data Perspective

Towards Better Ultrasound Video Segmentation Foundation Model: An Empirical study on SAM2 Finetuning from Data Perspective

URL: http://arxiv.org/abs/2511.05731v1
Date: Fri, 07 Nov 2025 21:45:18 GMT
Title: Towards Better Ultrasound Video Segmentation Foundation Model: An Empirical study on SAM2 Finetuning from Data Perspective
Authors: Xing Yao, Ahana Gangopadhyay, Hsi-Ming Chang, Ravi Soni,
Abstract summary: We present a data-centric investigation of SAM2 adaptation for ultrasound video segmentation.<n>We analyze how training-set size, video duration, and augmentation schemes affect adaptation performance.
Score: 0.7629717457706325
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ultrasound (US) video segmentation remains a challenging problem due to strong inter- and intra-dataset variability, motion artifacts, and limited annotated data. Although foundation models such as Segment Anything Model 2 (SAM2) demonstrate strong zero-shot and prompt-guided segmentation capabilities, their performance deteriorates substantially when transferred to medical imaging domains. Current adaptation studies mainly emphasize architectural modifications, while the influence of data characteristics and training regimes has not been systematically examined. In this study, we present a comprehensive, data-centric investigation of SAM2 adaptation for ultrasound video segmentation. We analyze how training-set size, video duration, and augmentation schemes affect adaptation performance under three paradigms: task-specific fine-tuning, intermediate adaptation, and multi-task joint training, across five SAM2 variants and multiple prompting modes. We further design six ultrasound-specific augmentations, assessing their effect relative to generic strategies. Experiments on three representative ultrasound datasets reveal that data scale and temporal context play a more decisive role than model architecture or initialization. Moreover, joint training offers an efficient compromise between modality alignment and task specialization. This work aims to provide empirical insights for developing efficient, data-aware adaptation pipelines for SAM2 in ultrasound video analysis.

Related papers

VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel [68.24765319399286]
We present VesSAM, a powerful and efficient framework tailored for 2D vessel segmentation.<n>VesSAM integrates (1) a convolutional adapter to enhance local texture features, (2) a multi-prompt encoder that fuses anatomical prompts, and (3) a lightweight mask decoder to reduce jagged artifacts.<n>VesSAM consistently outperforms state-of-the-art PEFT-based SAM variants by over 10% Dice and 13% IoU.
arXiv Detail & Related papers (2025-11-02T15:47:05Z)
SAM2-3dMed: Empowering SAM2 for 3D Medical Image Segmentation [7.646703242040606]
We propose SAM2-3dMed, an adaptation of SAM2 for 3D medical imaging.<n>A Slice Relative Position Prediction (SRPP) module explicitly models bidirectional inter-slice dependencies.<n>A Boundary Detection (BD) module enhances segmentation accuracy along critical organ and tissue boundaries.<n>Our approach not only advances 3D medical image segmentation performance but also offers a general paradigm for adapting video-centric foundation models to spatial data.
arXiv Detail & Related papers (2025-10-10T03:23:05Z)
MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation [55.37355146924576]
MedSeqFT is a sequential fine-tuning framework for medical image analysis.<n>It adapts pre-trained models to new tasks while refining their representational capacity.<n>It consistently outperforms state-of-the-art fine-tuning strategies.
arXiv Detail & Related papers (2025-09-07T15:22:53Z)
Differential-UMamba: Rethinking Tumor Segmentation Under Limited Data Scenarios [3.1231963031043786]
We introduce Diff-UMamba, a novel architecture that combines the UNet framework with the mamba mechanism to model long-range dependencies.<n>At the heart of Diff-UMamba is a noise reduction module, which employs a signal differencing strategy to suppress noisy or irrelevant activations.<n>The architecture achieves improved segmentation accuracy and robustness, particularly in low-data settings.
arXiv Detail & Related papers (2025-07-24T08:23:11Z)
The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound [60.80780313225093]
This study systematically investigated the impact of data augmentation and preprocessing strategies in self-supervised learning for lung ultrasound.<n>Three data augmentation pipelines were assessed: a baseline pipeline commonly used across imaging domains, a novel semantic-preserving pipeline designed for ultrasound, and a distilled set of the most effective transformations from both pipelines.
arXiv Detail & Related papers (2025-04-10T16:26:47Z)
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos [11.589704875476325]
We introduce E-ViM$3$, a data-efficient Vision Mamba network that preserves the 3D structure of video data.<n>Our model achieves competitive performance with limited labels, highlighting its potential impact on real-world clinical applications.
arXiv Detail & Related papers (2025-03-26T05:54:13Z)
Enhanced segmentation of femoral bone metastasis in CT scans of patients using synthetic data generation with 3D diffusion models [0.06700983301090582]
We propose an automated data pipeline using 3D Denoising Diffusion Probabilistic Models (DDPM) to generalize on new images. We created 5675 new volumes, then trained 3D U-Net segmentation models on real and synthetic data to compare segmentation performance.
arXiv Detail & Related papers (2024-09-17T09:21:19Z)
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation. Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process. Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z)
Deep models for stroke segmentation: do complex architectures always perform better? [1.4651272514940197]
Stroke segmentation plays a crucial role in the diagnosis and treatment of stroke patients. Deep models have been introduced for general medical image segmentation. In this study, we selected four types of deep models that were recently proposed and evaluated their performance for stroke segmentation.
arXiv Detail & Related papers (2024-03-25T20:44:01Z)
Improving GANs with A Dynamic Discriminator [106.54552336711997]
We argue that a discriminator with an on-the-fly adjustment on its capacity can better accommodate such a time-varying task. A comprehensive empirical study confirms that the proposed training strategy, termed as DynamicD, improves the synthesis performance without incurring any additional cost or training objectives.
arXiv Detail & Related papers (2022-09-20T17:57:33Z)
Impact of dataset size and long-term ECoG-based BCI usage on deep learning decoders performance [4.7773230870500605]
In brain-computer interfaces (BCI) research, recording data is time-consuming and expensive. Can we achieve higher decoding performance with more data to train decoders? High decoding performance was obtained with relatively small datasets recorded later in the experiment.
arXiv Detail & Related papers (2022-09-08T13:01:05Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.