CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images
- URL: http://arxiv.org/abs/2412.05530v1
- Date: Sat, 07 Dec 2024 04:10:37 GMT
- Title: CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images
- Authors: Xinjie Sun, Boxiong Wei, Yalong Jiang, Liquan Mao, Qi Zhao,
- Abstract summary: Thyroid nodule segmentation in ultrasound images is crucial for accurate diagnosis and treatment planning.<n>Existing methods face challenges in segmentation accuracy, interpretability, and generalization, which hinder their performance.<n>This letter proposes a novel framework, CLIP-TNseg, to address these issues by integrating a multimodal large model with a neural network architecture.
- Score: 10.926065365983886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thyroid nodule segmentation in ultrasound images is crucial for accurate diagnosis and treatment planning. However, existing methods face challenges in segmentation accuracy, interpretability, and generalization, which hinder their performance. This letter proposes a novel framework, CLIP-TNseg, to address these issues by integrating a multimodal large model with a neural network architecture. CLIP-TNseg consists of two main branches: the Coarse-grained Branch, which extracts high-level semantic features from a frozen CLIP model, and the Fine-grained Branch, which captures fine-grained features using U-Net style residual blocks. These features are fused and processed by the prediction head to generate precise segmentation maps. CLIP-TNseg leverages the Coarse-grained Branch to enhance semantic understanding through textual and high-level visual features, while the Fine-grained Branch refines spatial details, enabling precise and robust segmentation. Extensive experiments on public and our newly collected datasets demonstrate its competitive performance. Our code and the original dataset are available at https://github.com/jayxjsun/CLIP-TNseg.
Related papers
- Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation [55.486872677160015]
We propose Chimera-Seg, which integrates a segmentation backbone as the body and a CLIP-based semantic head as the head.<n>Specifically, Chimera-Seg comprises a trainable segmentation model and a CLIP Semantic Head (CSH), which maps dense features into the CLIP-aligned space.<n>We also propose Selective Global Distillation (SGD), which distills knowledge from dense features exhibiting high similarity to the CLIP CLS token.
arXiv Detail & Related papers (2025-06-27T09:26:50Z) - InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation [15.666926528144202]
We propose an efficient framework for the segmentation task, named InceptionMamba, which encodes multi-stage rich features.<n>We exploit semantic cues to capture both low-frequency and high-frequency regions to enrich the multi-stage features.<n>Our model achieves state-of-the-art performance on two challenging microscopic segmentation datasets.
arXiv Detail & Related papers (2025-06-13T20:25:12Z) - CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation [8.56773843063124]
Most publicly available medical segmentation datasets are only partially labeled.<n>In this study, we proposed a novel CLIP-DINO Prompt-Driven Network (CDPDNet)<n>CDPDNet combines a self-supervised vision transformer with CLIP-based text embedding and introduced task-specific text prompts to tackle these challenges.
arXiv Detail & Related papers (2025-05-25T03:23:58Z) - PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors.
Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images.
Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z) - MGFI-Net: A Multi-Grained Feature Integration Network for Enhanced Medical Image Segmentation [0.3108011671896571]
A major challenge in medical image segmentation is achieving accurate delineation of regions of interest in the presence of noise, low contrast, or complex anatomical structures.
Existing segmentation models often neglect the integration of multi-grained information and fail to preserve edge details.
We propose a novel image semantic segmentation model called the Multi-Grained Feature Integration Network (MGFI-Net)
Our MGFI-Net is designed with two dedicated modules to tackle these issues.
arXiv Detail & Related papers (2025-02-19T15:24:34Z) - MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation [2.2585213273821716]
We introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans.
Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss.
We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further.
arXiv Detail & Related papers (2024-09-28T23:10:37Z) - MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation [2.2585213273821716]
We propose a novel framework, called MedCLIP-SAM, that combines CLIP and SAM models to generate segmentation of clinical scans.
By extensively testing three diverse segmentation tasks and medical image modalities, our proposed framework has demonstrated excellent accuracy.
arXiv Detail & Related papers (2024-03-29T15:59:11Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Implicit Anatomical Rendering for Medical Image Segmentation with
Stochastic Experts [11.007092387379078]
We propose MORSE, a generic implicit neural rendering framework designed at an anatomical level to assist learning in medical image segmentation.
Our approach is to formulate medical image segmentation as a rendering problem in an end-to-end manner.
Our experiments demonstrate that MORSE can work well with different medical segmentation backbones.
arXiv Detail & Related papers (2023-04-06T16:44:03Z) - Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images [55.83984261827332]
In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network.
We develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module.
Our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches.
arXiv Detail & Related papers (2022-12-01T07:32:56Z) - BCS-Net: Boundary, Context and Semantic for Automatic COVID-19 Lung
Infection Segmentation from CT Images [83.82141604007899]
BCS-Net is a novel network for automatic COVID-19 lung infection segmentation from CT images.
BCS-Net follows an encoder-decoder architecture, and more designs focus on the decoder stage.
In each BCSR block, the attention-guided global context (AGGC) module is designed to learn the most valuable encoder features for decoder.
arXiv Detail & Related papers (2022-07-17T08:54:07Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z) - Boundary-aware Context Neural Network for Medical Image Segmentation [15.585851505721433]
Medical image segmentation can provide reliable basis for further clinical analysis and disease diagnosis.
Most existing CNNs-based methods produce unsatisfactory segmentation mask without accurate object boundaries.
In this paper, we formulate a boundary-aware context neural network (BA-Net) for 2D medical image segmentation.
arXiv Detail & Related papers (2020-05-03T02:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.