Related papers: STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation

STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation

URL: http://arxiv.org/abs/2504.01561v1
Date: Wed, 02 Apr 2025 10:01:42 GMT
Title: STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation
Authors: Dandan Shan, Zihan Li, Yunxiang Li, Qingde Li, Jie Tian, Qingqi Hong,
Abstract summary: We propose a Scale-language Text Prompt Network that leverages vision-aware modeling to enhance medical image segmentation.<n>Our approach utilizes multi-scale textual descriptions to guide lesion localization and employs retrieval-segmentation joint learning.<n>We evaluate our vision-language approach on three datasets: COVID-Xray, COVID-CT, and Kvasir-SEG.
Score: 8.812162673772459
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate segmentation of lesions plays a critical role in medical image analysis and diagnosis. Traditional segmentation approaches that rely solely on visual features often struggle with the inherent uncertainty in lesion distribution and size. To address these issues, we propose STPNet, a Scale-aware Text Prompt Network that leverages vision-language modeling to enhance medical image segmentation. Our approach utilizes multi-scale textual descriptions to guide lesion localization and employs retrieval-segmentation joint learning to bridge the semantic gap between visual and linguistic modalities. Crucially, STPNet retrieves relevant textual information from a specialized medical text repository during training, eliminating the need for text input during inference while retaining the benefits of cross-modal learning. We evaluate STPNet on three datasets: COVID-Xray, COVID-CT, and Kvasir-SEG. Experimental results show that our vision-language approach outperforms state-of-the-art segmentation methods, demonstrating the effectiveness of incorporating textual semantic knowledge into medical image analysis. The code has been made publicly on https://github.com/HUANGLIZI/STPNet.

Related papers

Text-driven Multiplanar Visual Interaction for Semi-supervised Medical Image Segmentation [48.76848912120607]
Semi-supervised medical image segmentation is a crucial technique for alleviating the high cost of data annotation.<n>We propose a novel text-driven multiplanar visual interaction framework for semi-supervised medical image segmentation (termed Text-SemiSeg)<n>Our framework consists of three main modules: Text-enhanced Multiplanar Representation (TMR), Category-aware Semantic Alignment (CSA), and Dynamic Cognitive Augmentation (DCA)
arXiv Detail & Related papers (2025-07-16T16:29:30Z)
Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation [11.540847583052381]
ProLearn is a Prototype-driven Learning framework for language-guided segmentation.<n>We introduce a novel Prototype-driven Semantic Approximation (PSA) module to enable approximation of semantic guidance from textual input.<n>ProLearn outperforms state-of-the-art language-guided methods when limited text is available.
arXiv Detail & Related papers (2025-07-15T07:38:49Z)
MAMBO-NET: Multi-Causal Aware Modeling Backdoor-Intervention Optimization for Medical Image Segmentation Network [51.68708264694361]
Confusion factors can affect medical images, such as complex anatomical variations and imaging modality limitations.<n>We propose a multi-causal aware modeling backdoor-intervention optimization network for medical image segmentation.<n>Our method significantly reduces the influence of confusion factors, leading to enhanced segmentation accuracy.
arXiv Detail & Related papers (2025-05-28T01:40:10Z)
Text-Promptable Propagation for Referring Medical Image Sequence Segmentation [20.724643106195852]
Ref-MISS aims to segment anatomical structures in medical image sequences based on natural language descriptions. Existing 2D and 3D segmentation models struggle to explicitly track objects of interest across medical image sequences. We propose Text-Promptable Propagation (TPP), a model designed for referring medical image sequence segmentation.
arXiv Detail & Related papers (2025-02-16T12:13:11Z)
Language-guided Medical Image Segmentation with Target-informed Multi-level Contrastive Alignments [7.9714765680840625]
We propose a language-guided segmentation network with Target-informed Multi-level Contrastive Alignments (TMCA)<n>TMCA enables target-informed cross-modality alignments and fine-grained text guidance to bridge the pattern gaps in language-guided segmentation.
arXiv Detail & Related papers (2024-12-18T06:19:03Z)
LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation [7.912408164613206]
Medical Image Referring (MIRS) requires segmenting lesions in images based on the given language expressions. We propose an approach named Language-guided Scale-aware MedSegmentor (LSMS) Our LSMS consistently outperforms on all datasets with lower computational costs.
arXiv Detail & Related papers (2024-08-30T15:22:13Z)
PathAlign: A vision-language model for whole slide images in histopathology [13.567674461880905]
We develop a vision-language model based on the BLIP-2 framework using WSIs and curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization.
arXiv Detail & Related papers (2024-06-27T23:43:36Z)
Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings. We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features. Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z)
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting [8.12405696290333]
CPSeg is a framework designed to augment image segmentation performance by integrating a novel "Chain-of-Thought" process. We propose a new vision-language dataset, FloodPrompt, which includes images, semantic masks, and corresponding text information.
arXiv Detail & Related papers (2023-10-24T13:32:32Z)
Self-Supervised Correction Learning for Semi-Supervised Biomedical Image Segmentation [84.58210297703714]
We propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation. We design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting. Experiments on three medical image segmentation datasets for different tasks demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2023-01-12T08:19:46Z)
Fine-Grained Semantically Aligned Vision-Language Pre-Training [151.7372197904064]
Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks. Existing methods mainly model the cross-modal alignment by the similarity of the global representations of images and texts. We introduce LO, a fine-grained semantically aLigned visiOn-langUage PrE-training framework, which learns fine-grained semantic alignment from the novel perspective of game-theoretic interactions.
arXiv Detail & Related papers (2022-08-04T07:51:48Z)
Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Segmentation [46.678279106837294]
We propose a cross-level constrastive learning scheme to enhance representation capacity for local features in semi-supervised medical image segmentation. With the help of the cross-level contrastive learning and consistency constraint, the unlabelled data can be effectively explored to improve segmentation performance.
arXiv Detail & Related papers (2022-02-08T15:12:11Z)
CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS) CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment. Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z)
Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation. We construct our few-shot image segmentor using a deep convolutional network trained episodically. We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.