Related papers: TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM

TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM

URL: http://arxiv.org/abs/2406.15764v1
Date: Sat, 22 Jun 2024 07:00:35 GMT
Title: TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM
Authors: Wenxue Li, Xinyu Xiong, Peng Xia, Lie Ju, Zongyuan Ge,
Abstract summary: We propose a novel framework that customizes SAM for text-prompted Diabetic Retinopathy (DR) lesion segmentation. Our core idea involves exploiting language cues to inject medical prior knowledge into the vision-only segmentation network. Specifically, to unleash the potential of vision-language models in the recognition of medical concepts, we propose an explicit prior encoder.
Score: 13.960042520448646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large foundation models, such as the Segment Anything Model (SAM), have demonstrated considerable promise across various tasks. Despite their progress, these models still encounter challenges in specialized medical image analysis, especially in recognizing subtle inter-class differences in Diabetic Retinopathy (DR) lesion segmentation. In this paper, we propose a novel framework that customizes SAM for text-prompted DR lesion segmentation, termed TP-DRSeg. Our core idea involves exploiting language cues to inject medical prior knowledge into the vision-only segmentation network, thereby combining the advantages of different foundation models and enhancing the credibility of segmentation. Specifically, to unleash the potential of vision-language models in the recognition of medical concepts, we propose an explicit prior encoder that transfers implicit medical concepts into explicit prior knowledge, providing explainable clues to excavate low-level features associated with lesions. Furthermore, we design a prior-aligned injector to inject explicit priors into the segmentation process, which can facilitate knowledge sharing across multi-modality features and allow our framework to be trained in a parameter-efficient fashion. Experimental results demonstrate the superiority of our framework over other traditional models and foundation model variants.

Related papers

Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z)
Distribution-Based Masked Medical Vision-Language Model Using Structured Reports [9.306835492101413]
Medical image-text pre-training aims to align medical images with clinically relevant text to improve model performance on various downstream tasks.<n>This work introduces an uncertainty-aware medical image-text pre-training model that enhances generalization capabilities in medical image analysis.
arXiv Detail & Related papers (2025-07-29T13:31:24Z)
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging [4.341503087761129]
Conducting multimodal learning involves visual and text modalities shown as a solution, but collecting paired vision-language datasets is expensive and time-consuming. Inspired by the superior ability in numerous cross-modal tasks for Large Language Models (LLMs), we proposed a novel Vision-LLM union framework to address the issues.
arXiv Detail & Related papers (2025-04-09T23:33:35Z)
Dynamically evolving segment anything model with continuous learning for medical image segmentation [50.92344083895528]
We introduce EvoSAM, a dynamically evolving medical image segmentation model. EvoSAM continuously accumulates new knowledge from an ever-expanding array of scenarios and tasks. Experiments conducted by surgical clinicians on blood vessel segmentation confirm that EvoSAM enhances segmentation efficiency based on user prompts.
arXiv Detail & Related papers (2025-03-08T14:37:52Z)
A Comprehensive Review of U-Net and Its Variants: Advances and Applications in Medical Image Segmentation [0.0]
This paper classifies medical image datasets on the basis of their imaging modalities and examines U-Net and its various improvement models. We summarize the four central improvement mechanisms of the U-Net and U-Net variant algorithms. We propose potential avenues and strategies for future advancements.
arXiv Detail & Related papers (2025-02-09T13:11:51Z)
Adversarial Vessel-Unveiling Semi-Supervised Segmentation for Retinopathy of Prematurity Diagnosis [9.683492465191241]
We propose a semi supervised segmentation framework designed to advance ROP studies without the need for extensive manual vessel annotation. Unlike previous methods that rely solely on limited labeled data, our approach integrates uncertainty weighted vessel unveiling module and domain adversarial learning. We validate our approach on public datasets and an in-house ROP dataset, demonstrating its superior performance across multiple evaluation metrics.
arXiv Detail & Related papers (2024-11-14T02:40:34Z)
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models. Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework. Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z)
LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks. We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD) LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z)
MOSMOS: Multi-organ segmentation facilitated by medical report supervision [10.396987980136602]
We propose a novel pre-training & fine-tuning framework for Multi-Organ Supervision (MOS) Specifically, we first introduce global contrastive learning to align medical image-report pairs in the pre-training stage. To remedy the discrepancy, we further leverage multi-label recognition to implicitly learn the semantic correspondence between image pixels and organ tags.
arXiv Detail & Related papers (2024-09-04T03:46:17Z)
Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models [7.987836953849249]
Existing segmentation algorithms mostly rely on the availability of fully annotated images with pixel-wise annotations for training. To alleviate this challenge, there has been a growing focus on developing segmentation methods that can train deep models with weak annotations. The emergence of vision foundation models, notably the Segment Anything Model (SAM), has introduced innovative capabilities for segmentation tasks using weak annotations.
arXiv Detail & Related papers (2024-04-20T02:40:49Z)
Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models [0.0]
We show that vision-language models can be used to alleviate the dependence on a large number of concept-annotated samples. In particular, we propose an embedding learning strategy to adapt CLIP to the downstream task of skin lesion classification using concept-based descriptions as textual embeddings.
arXiv Detail & Related papers (2023-11-24T08:31:34Z)
Self-Prompting Large Vision Models for Few-Shot Medical Image Segmentation [14.135249795318591]
We propose a novel perspective on self-prompting in medical vision applications. We harness the embedding space of the Segment Anything Model to prompt itself through a simple yet effective linear pixel-wise classifier. We achieve competitive results on multiple datasets.
arXiv Detail & Related papers (2023-08-15T08:20:07Z)
Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach. Our approach is easy to integrate into any hybrid model and requires no external training data. Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z)
Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights. Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion. Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z)
Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation. We construct our few-shot image segmentor using a deep convolutional network trained episodically. We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
Towards Cross-modality Medical Image Segmentation with Online Mutual Knowledge Distillation [71.89867233426597]
In this paper, we aim to exploit the prior knowledge learned from one modality to improve the segmentation performance on another modality. We propose a novel Mutual Knowledge Distillation scheme to thoroughly exploit the modality-shared knowledge. Experimental results on the public multi-class cardiac segmentation data, i.e., MMWHS 2017, show that our method achieves large improvements on CT segmentation.
arXiv Detail & Related papers (2020-10-04T10:25:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.