Stitching, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation
- URL: http://arxiv.org/abs/2403.11229v2
- Date: Sat, 18 Jan 2025 03:01:31 GMT
- Title: Stitching, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation
- Authors: Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao,
- Abstract summary: Segment Anything Model (SAM) fine-tuning has shown remarkable performance in medical image segmentation in a fully supervised manner.<n>We propose a three-stage framework, i.e., Stitching, Fine-tuning, and Re-training (SFR)<n>Our SFR framework is plug-and-play, and easily compatible with various popular semi-supervised methods.
- Score: 40.79197318484472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segment Anything Model (SAM) fine-tuning has shown remarkable performance in medical image segmentation in a fully supervised manner, but requires precise annotations. To reduce the annotation cost and maintain satisfactory performance, in this work, we leverage the capabilities of SAM for establishing semi-supervised medical image segmentation models. Rethinking the requirements of effectiveness, efficiency, and compatibility, we propose a three-stage framework, i.e., Stitching, Fine-tuning, and Re-training (SFR). The current fine-tuning approaches mostly involve 2D slice-wise fine-tuning that disregards the contextual information between adjacent slices. Our stitching strategy mitigates the mismatch between natural and 3D medical images. The stitched images are then used for fine-tuning SAM, providing robust initialization of pseudo-labels. Afterwards, we train a 3D semi-supervised segmentation model while maintaining the same parameter size as the conventional segmenter such as V-Net. Our SFR framework is plug-and-play, and easily compatible with various popular semi-supervised methods. We also develop an extended framework SFR$^+$ with selective fine-tuning and re-training through confidence estimation. Extensive experiments validate that our SFR and SFR$^+$ achieve significant improvements in both moderate annotation and scarce annotation across five datasets. In particular, SFR framework improves the Dice score of Mean Teacher from 29.68% to 74.40% with only one labeled data of LA dataset.
Related papers
- Boundary-Aware Test-Time Adaptation for Zero-Shot Medical Image Segmentation [12.159529070716824]
BA-TTA-SAM is a test-time adaptation framework that enhances the zero-shot segmentation performance of SAM via test-time adaptation.<n>Our framework consistently outperforms state-of-the-art models in medical image segmentation.
arXiv Detail & Related papers (2025-12-04T07:08:21Z) - VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel [68.24765319399286]
We present VesSAM, a powerful and efficient framework tailored for 2D vessel segmentation.<n>VesSAM integrates (1) a convolutional adapter to enhance local texture features, (2) a multi-prompt encoder that fuses anatomical prompts, and (3) a lightweight mask decoder to reduce jagged artifacts.<n>VesSAM consistently outperforms state-of-the-art PEFT-based SAM variants by over 10% Dice and 13% IoU.
arXiv Detail & Related papers (2025-11-02T15:47:05Z) - BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation [11.634558989215392]
Vision foundation models like the Segment Anything Model (SAM) often struggle in medical image segmentation due to a lack of domain-specific adaptation.<n>We propose BALR-SAM, a boundary-aware low-rank adaptation framework that enhances SAM for medical imaging.<n>It combines three tailored components: (1) a Complementary Detail Enhancement Network (CDEN) using depthwise separable convolutions and multi-scale fusion to capture boundary-sensitive features essential for accurate segmentation; (2) low-rank adapters integrated into SAM's Vision Transformer blocks to optimize feature representation and attention for medical contexts, while simultaneously significantly reducing the parameter space; and
arXiv Detail & Related papers (2025-09-29T02:36:09Z) - Semi-Supervised 3D Medical Segmentation from 2D Natural Images Pretrained Model [0.8758593614464055]
This paper explores the transfer of knowledge from general vision models pretrained on 2D natural images to improve 3D medical image segmentation.<n>We propose a model-agnostic framework that progressively distills knowledge from a 2D pretrained model to a 3D segmentation model trained from scratch.<n>Our approach, M&N, involves iterative co-training of the two models using pseudo-masks generated by each other, along with our proposed learning rate guided sampling.
arXiv Detail & Related papers (2025-09-18T17:17:52Z) - Annotation-Efficient Task Guidance for Medical Segment Anything [0.31077024712075796]
Medical image segmentation is a key task in the imaging workflow, influencing many image-based decisions.
Traditional, fully-supervised segmentation models rely on large amounts of labeled training data, which can be an expensive, time-consuming, and error-prone process.
We propose SAM-Mix, a novel multitask learning framework for medical image segmentation.
arXiv Detail & Related papers (2024-12-11T17:47:00Z) - Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2 [1.6237741047782823]
We introduce a method for zero-shot, single-prompt segmentation of 3D knee MRI by adapting Segment Anything Model 2.
By treating slices from 3D medical volumes as individual video frames, we leverage SAM2's advanced capabilities to generate motion- and spatially-aware predictions.
We demonstrate that SAM2 can efficiently perform segmentation tasks in a zero-shot manner with no additional training or fine-tuning.
arXiv Detail & Related papers (2024-08-08T21:39:15Z) - A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation [5.011091042850546]
Adapting foundation models for medical image analysis requires finetuning them on a considerable amount of data.
collecting task-specific medical data for such finetuning at a central location raises many privacy concerns.
Although Federated learning (FL) provides an effective means for training on private decentralized data, communication costs in federating large foundation models can quickly become a significant bottleneck.
arXiv Detail & Related papers (2024-07-31T16:48:06Z) - SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images [3.2099042811875833]
We propose a strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images.
We leverage few-shot embeddings derived from a limited set of labeled images as prompts for anatomical querying objects captured in image embeddings.
Our method prioritizes the efficiency of the fine-tuning process by exclusively training the mask decoder through caching mechanisms.
arXiv Detail & Related papers (2024-07-05T17:07:25Z) - I-MedSAM: Implicit Medical Image Segmentation with Segment Anything [24.04558900909617]
We propose I-MedSAM, which leverages the benefits of both continuous representations and SAM to obtain better cross-domain ability and accurate boundary delineation.
Our proposed method with only 1.6M trainable parameters outperforms existing methods including discrete and implicit methods.
arXiv Detail & Related papers (2023-11-28T00:43:52Z) - Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained
Image Foundation Models [13.08275555017179]
We propose ProMISe, a prompt-driven 3D medical image segmentation model using only a single point prompt.
We evaluate our model on two public datasets for colon and pancreas tumor segmentations.
arXiv Detail & Related papers (2023-10-30T16:49:03Z) - MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM.
Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments.
By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z) - Towards Label-free Scene Understanding by Vision Foundation Models [87.13117617056004]
We investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data.
We propose a novel Cross-modality Noisy Supervision (CNS) method that leverages the strengths of CLIP and SAM to supervise 2D and 3D networks simultaneously.
Our 2D and 3D network achieves label-free semantic segmentation with 28.4% and 33.5% mIoU on ScanNet, improving 4.7% and 7.9%, respectively.
arXiv Detail & Related papers (2023-06-06T17:57:49Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Image Understands Point Cloud: Weakly Supervised 3D Semantic
Segmentation via Association Learning [59.64695628433855]
We propose a novel cross-modality weakly supervised method for 3D segmentation, incorporating complementary information from unlabeled images.
Basically, we design a dual-branch network equipped with an active labeling strategy, to maximize the power of tiny parts of labels.
Our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.
arXiv Detail & Related papers (2022-09-16T07:59:04Z) - PA-Seg: Learning from Point Annotations for 3D Medical Image
Segmentation using Contextual Regularization and Cross Knowledge Distillation [14.412073730567137]
We propose to annotate a segmentation target with only seven points in 3D medical images, and design a two-stage weakly supervised learning framework PA-Seg.
In the first stage, we employ geodesic distance transform to expand the seed points to provide more supervision signal.
In the second stage, we use predictions obtained by the model pre-trained in the first stage as pseudo labels.
arXiv Detail & Related papers (2022-08-11T07:00:33Z) - A Unified Framework for Generalized Low-Shot Medical Image Segmentation
with Scarce Data [24.12765716392381]
We propose a unified framework for generalized low-shot (one- and few-shot) medical image segmentation based on distance metric learning (DML)
Via DML, the framework learns a multimodal mixture representation for each category, and performs dense predictions based on cosine distances between the pixels' deep embeddings and the category representations.
In our experiments on brain MRI and abdominal CT datasets, the proposed framework achieves superior performances for low-shot segmentation towards standard DNN-based (3D U-Net) and classical registration-based (ANTs) methods.
arXiv Detail & Related papers (2021-10-18T13:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.