Related papers: TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model

TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model

URL: http://arxiv.org/abs/2409.03412v1
Date: Thu, 5 Sep 2024 11:01:48 GMT
Title: TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model
Authors: Yihao Zhao, Enhao Zhong, Cuiyun Yuan, Yang Li, Man Zhao, Chunxia Li, Jun Hu, Chenbin Liu,
Abstract summary: TG-LMM (Text-Guided Large Multi-Modal Model) is a novel approach that leverages textual descriptions of organs to enhance segmentation accuracy in medical images. Our method demonstrated superior performance compared to existing approaches, such as MedSAM, SAM and nnUnet.
Score: 4.705494927820468
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose TG-LMM (Text-Guided Large Multi-Modal Model), a novel approach that leverages textual descriptions of organs to enhance segmentation accuracy in medical images. Existing medical image segmentation methods face several challenges: current medical automatic segmentation models do not effectively utilize prior knowledge, such as descriptions of organ locations; previous text-visual models focus on identifying the target rather than improving the segmentation accuracy; prior models attempt to use prior knowledge to enhance accuracy but do not incorporate pre-trained models. To address these issues, TG-LMM integrates prior knowledge, specifically expert descriptions of the spatial locations of organs, into the segmentation process. Our model utilizes pre-trained image and text encoders to reduce the number of training parameters and accelerate the training process. Additionally, we designed a comprehensive image-text information fusion structure to ensure thorough integration of the two modalities of data. We evaluated TG-LMM on three authoritative medical image datasets, encompassing the segmentation of various parts of the human body. Our method demonstrated superior performance compared to existing approaches, such as MedSAM, SAM and nnUnet.

Related papers

No Masks Needed: Explainable AI for Deriving Segmentation from Classification [6.647179199462945]
We introduce a novel approach that fine-tunes pre-trained models specifically for medical images.<n>Our method integrates Explainable AI to generate relevance scores, enhancing the segmentation process.<n>Unlike traditional methods that excel in standard benchmarks but falter in medical applications, our approach achieves improved results on datasets like CBIS-DDSM, NuInsSeg and Kvasir-SEG.
arXiv Detail & Related papers (2025-08-06T15:18:00Z)
Organ-aware Multi-scale Medical Image Segmentation Using Text Prompt Engineering [17.273290949721975]
Existing medical image segmentation methods rely on uni-modal visual inputs, such as images or videos, requiring labor-intensive manual annotations. Medical imaging techniques capture multiple intertwined organs within a single scan, further complicating segmentation accuracy. To address these challenges, MedSAM was developed to enhance segmentation accuracy by integrating image features with user-provided prompts.
arXiv Detail & Related papers (2025-03-18T01:35:34Z)
MGFI-Net: A Multi-Grained Feature Integration Network for Enhanced Medical Image Segmentation [0.3108011671896571]
A major challenge in medical image segmentation is achieving accurate delineation of regions of interest in the presence of noise, low contrast, or complex anatomical structures. Existing segmentation models often neglect the integration of multi-grained information and fail to preserve edge details. We propose a novel image semantic segmentation model called the Multi-Grained Feature Integration Network (MGFI-Net) Our MGFI-Net is designed with two dedicated modules to tackle these issues.
arXiv Detail & Related papers (2025-02-19T15:24:34Z)
MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data. This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
MIST: A Simple and Scalable End-To-End 3D Medical Imaging Segmentation Framework [1.4043931310479378]
The Medical Imaging Toolkit (MIST) is designed to facilitate consistent training, testing, and evaluation of deep learning-based medical imaging segmentation methods. MIST standardizes data analysis, preprocessing, and evaluation pipelines, accommodating multiple architectures and loss functions.
arXiv Detail & Related papers (2024-07-31T05:17:31Z)
Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings. We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features. Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z)
MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation [2.2585213273821716]
We propose a novel framework, called MedCLIP-SAM, that combines CLIP and SAM models to generate segmentation of clinical scans. By extensively testing three diverse segmentation tasks and medical image modalities, our proposed framework has demonstrated excellent accuracy.
arXiv Detail & Related papers (2024-03-29T15:59:11Z)
Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained Image Foundation Models [13.08275555017179]
We propose ProMISe, a prompt-driven 3D medical image segmentation model using only a single point prompt. We evaluate our model on two public datasets for colon and pancreas tumor segmentations.
arXiv Detail & Related papers (2023-10-30T16:49:03Z)
MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z)
Self-Sampling Meta SAM: Enhancing Few-shot Medical Image Segmentation with Meta-Learning [17.386754270460273]
We present a Self-Sampling Meta SAM framework for few-shot medical image segmentation. The proposed method achieves significant improvements over state-of-the-art methods in few-shot segmentation. In conclusion, we present a novel approach for rapid online adaptation in interactive image segmentation, adapting to a new organ in just 0.83 minutes.
arXiv Detail & Related papers (2023-08-31T05:20:48Z)
Multiscale Progressive Text Prompt Network for Medical Image Segmentation [10.121625177837931]
We propose using progressive text prompts as prior knowledge to guide the segmentation process. Our model achieves high-quality results with low data annotation costs.
arXiv Detail & Related papers (2023-06-30T23:37:16Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach. Our approach is easy to integrate into any hybrid model and requires no external training data. Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks. We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z)
Towards Cross-modality Medical Image Segmentation with Online Mutual Knowledge Distillation [71.89867233426597]
In this paper, we aim to exploit the prior knowledge learned from one modality to improve the segmentation performance on another modality. We propose a novel Mutual Knowledge Distillation scheme to thoroughly exploit the modality-shared knowledge. Experimental results on the public multi-class cardiac segmentation data, i.e., MMWHS 2017, show that our method achieves large improvements on CT segmentation.
arXiv Detail & Related papers (2020-10-04T10:25:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.