Learning from Noisy Prompts: Saliency-Guided Prompt Distillation for Robust Segmentation with SAM
Abstract Overview
This paper introduces Saliency-Guided Prompt Distillation (SPD), a two-stage framework for adapting the Segment Anything Model (SAM) to medical image segmentation when only noisy, non-task-specific prompts are available. In the first stage, a lightweight saliency head is trained alongside LoRA-adapted encoder features to learn anatomical priors from ground-truth masks, producing saliency maps that indicate plausible target locations. In the second stage, a Contextual Prompt Distillation (CPD) module validates local prompts against the saliency map, enriches them with cross-validated prompts from neighboring slices, and forms a consensus prompt set for SAM decoding. A Pairwise Slice Consistency (PSC) loss enforces anatomical coherence between adjacent slice predictions. The method is evaluated on four MRI and CT datasets, including a real clinical terminal ileum dataset with centerline prompts and three datasets with simulated noisy prompts.
Novelty
The primary novelty is a framework explicitly designed for robustness to noisy prompts in SAM-based medical image segmentation, rather than assuming high-quality prompts or addressing noisy masks/labels. Its key contribution lies in combining saliency-based anatomical prior learning, a dual-validation cross-slice contextual prompt distillation mechanism, and a localized pairwise slice consistency loss to convert unreliable clinical prompts into consensus guidance.
Results
Across four datasets (TI, Scar, FUMPE, KiTS), SPD achieves statistically significant improvements (p < 0.05, Wilcoxon signed-rank test) over all comparison methods on TI, Scar, and FUMPE for all reported metrics, and achieves the highest scores on KiTS. On the TI dataset, SPD reports 73.58 DSC and 23.94 HD95, representing an 11.08% DSC increase and 6.28 HD95 reduction over the best competing method. Ablation studies show incremental gains from local prompt validation, CPD, and PSC, and zero-shot experiments demonstrate that consensus prompts improve frozen SAM performance over full original centerline prompts by 14.2% DSC and 13.6% IoU.
Key Points
- SPD learns saliency-based anatomical priors via a lightweight head and uses them to filter noisy prompts on the current slice and cross-validate prompts from neighboring slices, forming a consensus prompt set before SAM decoding.
- The method targets a clinically realistic setting where inference-time prompts are imperfect, demonstrated with real centerline annotations on terminal ileum MRI and simulated noisy prompts (1 true positive point plus 2-5 random points) on three additional datasets.
- Experiments show statistically significant improvements over both conventional supervised baselines and SAM-based adaptations on most datasets, with ablations confirming that each component—local prompt validation, contextual prompt distillation, and pairwise slice consistency—contributes to the overall gains.