CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation
- URL: http://arxiv.org/abs/2505.18958v2
- Date: Tue, 27 May 2025 14:57:47 GMT
- Title: CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation
- Authors: Jiong Wu, Yang Xing, Boxiao Yu, Wei Shao, Kuang Gong,
- Abstract summary: Most publicly available medical segmentation datasets are only partially labeled.<n>In this study, we proposed a novel CLIP-DINO Prompt-Driven Network (CDPDNet)<n>CDPDNet combines a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges.
- Score: 8.56773843063124
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Most publicly available medical segmentation datasets are only partially labeled, with annotations provided for a subset of anatomical structures. When multiple datasets are combined for training, this incomplete annotation poses challenges, as it limits the model's ability to learn shared anatomical representations among datasets. Furthermore, vision-only frameworks often fail to capture complex anatomical relationships and task-specific distinctions, leading to reduced segmentation accuracy and poor generalizability to unseen datasets. In this study, we proposed a novel CLIP-DINO Prompt-Driven Segmentation Network (CDPDNet), which combined a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges. Specifically, the framework was constructed upon a convolutional neural network (CNN) and incorporated DINOv2 to extract both fine-grained and global visual features, which were then fused using a multi-head cross-attention module to overcome the limited long-range modeling capability of CNNs. In addition, CLIP-derived text embeddings were projected into the visual space to help model complex relationships among organs and tumors. To further address the partial label challenge and enhance inter-task discriminative capability, a Text-based Task Prompt Generation (TTPG) module that generated task-specific prompts was designed to guide the segmentation. Extensive experiments on multiple medical imaging datasets demonstrated that CDPDNet consistently outperformed existing state-of-the-art segmentation methods. Code and pretrained model are available at: https://github.com/wujiong-hub/CDPDNet.git.
Related papers
- Domain and Task-Focused Example Selection for Data-Efficient Contrastive Medical Image Segmentation [0.2765106384328772]
We propose a novel self-supervised contrastive learning framework for medical image segmentation, dubbed PolyCL.<n>PolyCL learns and transfers context-aware discriminant features useful for segmentation from an innovative surrogate.<n>We show that PolyCL outperforms fully-supervised and self-supervised baselines in both low-data and cross-domain scenarios.
arXiv Detail & Related papers (2025-05-25T16:11:48Z) - CENet: Context Enhancement Network for Medical Image Segmentation [3.4690322157094573]
We propose the Context Enhancement Network (CENet), a novel segmentation framework featuring two key innovations.<n>First, the Dual Selective Enhancement Block (DSEB) integrated into skip connections enhances boundary details and improves the detection of smaller organs in a context-aware manner.<n>Second, the Context Feature Attention Module (CFAM) in the decoder employs a multi-scale design to maintain spatial integrity, reduce feature redundancy, and mitigate overly enhanced representations.
arXiv Detail & Related papers (2025-05-23T23:22:18Z) - Rethinking Boundary Detection in Deep Learning-Based Medical Image Segmentation [29.37619692272332]
We propose a novel network architecture named CTO, which combines Convolutional Neural Networks (CNNs), Vision Transformer (ViT) models, and explicit edge detection operators.<n>CTO surpasses existing methods in terms of segmentation accuracy and strikes a better balance between accuracy and efficiency.<n>We validate the performance of CTO through extensive experiments conducted on seven challenging medical image segmentation datasets.
arXiv Detail & Related papers (2025-05-06T19:42:56Z) - CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images [10.926065365983886]
Thyroid nodule segmentation in ultrasound images is crucial for accurate diagnosis and treatment planning.<n>Existing methods face challenges in segmentation accuracy, interpretability, and generalization, which hinder their performance.<n>This letter proposes a novel framework, CLIP-TNseg, to address these issues by integrating a multimodal large model with a neural network architecture.
arXiv Detail & Related papers (2024-12-07T04:10:37Z) - MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - Point-In-Context: Understanding Point Cloud via In-Context Learning [67.20277182808992]
We introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context learning.
We address the technical challenge of effectively extending masked point modeling to 3D point clouds by introducing a Joint Sampling module.
We propose two novel training strategies, In-Context Labeling and In-Context Enhancing, forming an extended version of PIC named Point-In-Context-Segmenter (PIC-S)
arXiv Detail & Related papers (2024-04-18T17:32:32Z) - Learning from partially labeled data for multi-organ and tumor
segmentation [102.55303521877933]
We propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple datasets.
A dynamic head enables the network to accomplish multiple segmentation tasks flexibly.
We create a large-scale partially labeled Multi-Organ and Tumor benchmark, termed MOTS, and demonstrate the superior performance of our TransDoDNet over other competitors.
arXiv Detail & Related papers (2022-11-13T13:03:09Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Towards Robust Partially Supervised Multi-Structure Medical Image
Segmentation on Small-Scale Data [123.03252888189546]
We propose Vicinal Labels Under Uncertainty (VLUU) to bridge the methodological gaps in partially supervised learning (PSL) under data scarcity.
Motivated by multi-task learning and vicinal risk minimization, VLUU transforms the partially supervised problem into a fully supervised problem by generating vicinal labels.
Our research suggests a new research direction in label-efficient deep learning with partial supervision.
arXiv Detail & Related papers (2020-11-28T16:31:00Z) - DoDNet: Learning to segment multi-organ and tumors from multiple
partially labeled datasets [102.55303521877933]
We propose a dynamic on-demand network (DoDNet) that learns to segment multiple organs and tumors on partially labelled datasets.
DoDNet consists of a shared encoder-decoder architecture, a task encoding module, a controller for generating dynamic convolution filters, and a single but dynamic segmentation head.
arXiv Detail & Related papers (2020-11-20T04:56:39Z) - Boundary-aware Context Neural Network for Medical Image Segmentation [15.585851505721433]
Medical image segmentation can provide reliable basis for further clinical analysis and disease diagnosis.
Most existing CNNs-based methods produce unsatisfactory segmentation mask without accurate object boundaries.
In this paper, we formulate a boundary-aware context neural network (BA-Net) for 2D medical image segmentation.
arXiv Detail & Related papers (2020-05-03T02:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.