Language-guided Scale-aware MedSegmentor for Lesion Segmentation in Medical Imaging
- URL: http://arxiv.org/abs/2408.17347v3
- Date: Sat, 19 Apr 2025 17:27:18 GMT
- Title: Language-guided Scale-aware MedSegmentor for Lesion Segmentation in Medical Imaging
- Authors: Shuyi Ouyang, Jinyang Zhang, Xiangye Lin, Xilai Wang, Qingqing Chen, Yen-Wei Chen, Lanfen Lin,
- Abstract summary: In clinical practice, segmenting specific lesions can significantly enhance diagnostic accuracy and treatment efficiency.<n>We propose a novel model, Language-guided Scale-aware MedSegmentor (LSMS), which segments target lesions in medical images based on given textual expressions.<n>Our LSMS consistently achieves superior performance with significantly lower computational cost.
- Score: 7.912408164613206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In clinical practice, segmenting specific lesions based on the needs of physicians can significantly enhance diagnostic accuracy and treatment efficiency. However, conventional lesion segmentation models lack the flexibility to distinguish lesions according to specific requirements. Given the practical advantages of using text as guidance, we propose a novel model, Language-guided Scale-aware MedSegmentor (LSMS), which segments target lesions in medical images based on given textual expressions. We define this as a new task termed Referring Lesion Segmentation (RLS). To address the lack of suitable benchmarks for RLS, we construct a vision-language medical dataset named Reference Hepatic Lesion Segmentation (RefHL-Seg). LSMS incorporates two key designs: (i) Scale-Aware Vision-Language attention module, which performs visual feature extraction and vision-language alignment in parallel. By leveraging diverse convolutional kernels, this module acquires rich visual representations and interacts closely with linguistic features, thereby enhancing the model's capacity for precise object localization. (ii) Full-Scale Decoder, which globally models multi-modal features across multiple scales and captures complementary information between them to accurately delineate lesion boundaries. Additionally, we design a specialized loss function comprising both segmentation loss and vision-language contrastive loss to better optimize cross-modal learning. We validate the performance of LSMS on RLS as well as on conventional lesion segmentation tasks across multiple datasets. Our LSMS consistently achieves superior performance with significantly lower computational cost. Code and datasets will be released.
Related papers
- Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging [4.341503087761129]
Conducting multimodal learning involves visual and text modalities shown as a solution, but collecting paired vision-language datasets is expensive and time-consuming.
Inspired by the superior ability in numerous cross-modal tasks for Large Language Models (LLMs), we proposed a novel Vision-LLM union framework to address the issues.
arXiv Detail & Related papers (2025-04-09T23:33:35Z) - CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention [30.501326915750898]
We propose CausalCLIPSeg, an end-to-end framework for referring medical image segmentation.
Despite not being trained on medical data, we enforce CLIP's rich semantic space onto the medical domain.
To mitigate confounding bias that may cause the model to learn spurious correlations, CausalCLIPSeg introduces a causal intervention module.
arXiv Detail & Related papers (2025-03-20T08:46:24Z) - Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding [66.06337890279839]
Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding for downstream multi-modal tasks.
LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content.
We propose an Inter-Modality Correlation Decoding (IMCCD) method to mitigate hallucinations in LVLMs in a training-free manner.
arXiv Detail & Related papers (2025-01-03T17:56:28Z) - LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model.
We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy.
We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z) - MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation [2.2585213273821716]
We introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans.
Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss.
We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further.
arXiv Detail & Related papers (2024-09-28T23:10:37Z) - Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings.
We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features.
Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z) - SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images [13.971120210536995]
We introduce Scaling-up Mix with Multi-Class (SM2C) to improve the ability to learn semantic features within medical images.
By diversifying the shape of the segmentation objects and enriching the semantic information within each sample, the SM2C demonstrates its potential.
The proposed framework shows significant improvements over state-of-the-art counterparts.
arXiv Detail & Related papers (2024-03-24T04:39:40Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models [0.8878802873945023]
This study introduces the first systematic study on transferring Vision-Language Models to 2D medical images.
Although VLSMs show competitive performance compared to image-only models for segmentation, not all VLSMs utilize the additional information from language prompts.
arXiv Detail & Related papers (2023-08-15T11:28:21Z) - Meta-Learners for Few-Shot Weakly-Supervised Medical Image Segmentation [2.781492199939609]
We propose a generic Meta-Learning framework for few-shot weakly-supervised segmentation in medical imaging domains.
We conduct a comparative analysis of meta-learners adapted to few-shot image segmentation in different sparsely annotated radiological tasks.
arXiv Detail & Related papers (2023-05-11T15:57:45Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical
Image Segmentation [73.10707675345253]
We propose a general multi-scale in multi-scale subtraction network (M$2$SNet) to finish diverse segmentation from medical image.
Our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:26:49Z) - Self-Supervised Correction Learning for Semi-Supervised Biomedical Image
Segmentation [84.58210297703714]
We propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation.
We design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting.
Experiments on three medical image segmentation datasets for different tasks demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2023-01-12T08:19:46Z) - PCRLv2: A Unified Visual Information Preservation Framework for
Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics.
We also address the preservation of scale information, a powerful tool in aiding image understanding.
The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z) - Cross-level Contrastive Learning and Consistency Constraint for
Semi-supervised Medical Image Segmentation [46.678279106837294]
We propose a cross-level constrastive learning scheme to enhance representation capacity for local features in semi-supervised medical image segmentation.
With the help of the cross-level contrastive learning and consistency constraint, the unlabelled data can be effectively explored to improve segmentation performance.
arXiv Detail & Related papers (2022-02-08T15:12:11Z) - Towards Robust Partially Supervised Multi-Structure Medical Image
Segmentation on Small-Scale Data [123.03252888189546]
We propose Vicinal Labels Under Uncertainty (VLUU) to bridge the methodological gaps in partially supervised learning (PSL) under data scarcity.
Motivated by multi-task learning and vicinal risk minimization, VLUU transforms the partially supervised problem into a fully supervised problem by generating vicinal labels.
Our research suggests a new research direction in label-efficient deep learning with partial supervision.
arXiv Detail & Related papers (2020-11-28T16:31:00Z) - DONet: Dual Objective Networks for Skin Lesion Segmentation [77.9806410198298]
We propose a simple yet effective framework, named Dual Objective Networks (DONet), to improve the skin lesion segmentation.
Our DONet adopts two symmetric decoders to produce different predictions for approaching different objectives.
To address the challenge of large variety of lesion scales and shapes in dermoscopic images, we additionally propose a recurrent context encoding module (RCEM)
arXiv Detail & Related papers (2020-08-19T06:02:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.