One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts
- URL: http://arxiv.org/abs/2312.17183v3
- Date: Thu, 11 Jul 2024 06:09:39 GMT
- Title: One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts
- Authors: Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie,
- Abstract summary: We aim to build up a model that can Segment Anything in radiology scans, driven by Text prompts, termed as SAT.
We build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans.
We have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters) demonstrating comparable performance to 72 specialist nnU-Nets trained on each dataset/subsets.
- Score: 62.55349777609194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this study, we aim to build up a model that can Segment Anything in radiology scans, driven by Text prompts, termed as SAT. Our main contributions are three folds: (i) for dataset construction, we construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then we build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans from 72 segmentation datasets, across 497 classes, with careful standardization on both image scans and label space; (ii) for architecture design, we propose to inject medical knowledge into a text encoder via contrastive learning, and then formulate a universal segmentation model, that can be prompted by feeding in medical terminologies in text form; (iii) As a result, we have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters), demonstrating comparable performance to 72 specialist nnU-Nets trained on each dataset/subsets. We validate SAT as a foundational segmentation model, with better generalization ability on external (unseen) datasets, and can be further improved on specific tasks after fine-tuning adaptation. Comparing with interactive segmentation model, for example, MedSAM, segmentation model prompted by text enables superior performance, scalability and robustness. As a use case, we demonstrate that SAT can act as a powerful out-of-the-box agent for large language models, enabling visual grounding in clinical procedures such as report generation. All the data, codes, and models in this work have been released.
Related papers
- TotalSegmentator MRI: Sequence-Independent Segmentation of 59 Anatomical Structures in MR images [62.53931644063323]
In this study we extended the capabilities of TotalSegmentator to MR images.
We trained an nnU-Net segmentation algorithm on this dataset and calculated similarity coefficients (Dice) to evaluate the model's performance.
The model significantly outperformed two other publicly available segmentation models (Dice score 0.824 versus 0.762; p0.001 and 0.762 versus 0.542; p)
arXiv Detail & Related papers (2024-05-29T20:15:54Z) - Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography [50.08496922659307]
We propose a universal framework enabling a single model, termed Universal Model, to deal with multiple public datasets and adapt to new classes.
Firstly, we introduce a novel language-driven parameter generator that leverages language embeddings from large language models.
Secondly, the conventional output layers are replaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organs and six types of tumors.
arXiv Detail & Related papers (2024-05-28T16:55:15Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - SegVol: Universal and Interactive Volumetric Medical Image Segmentation [25.322437534713163]
We propose a 3D foundation segmentation model, named SegVol, supporting universal and interactive volumetric medical image segmentation.
By scaling up training data to 90K unlabeled Computed Tomography (CT) volumes and 6K labeled CT volumes, this foundation model supports the segmentation of over 200 anatomical categories.
Experiments on 22 anatomical segmentation tasks verify that SegVol outperforms the competitors in 19 tasks, with improvements up to 37.24% compared to the runner-up methods.
arXiv Detail & Related papers (2023-11-22T13:27:36Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of
Pneumothorax [5.168314889999992]
We propose a novel vision-language model, ConTEXTual Net, for the task of pneumothorax segmentation on chest radiographs.
We trained it on the CANDID-PTX dataset consisting of 3,196 positive cases of pneumothorax.
It achieved a Dice score of 0.716$pm$0.016, which was similar to the degree of inter-reader variability.
It outperformed both vision-only models and a competing vision-language model.
arXiv Detail & Related papers (2023-03-02T22:36:19Z) - Continual Segment: Towards a Single, Unified and Accessible Continual
Segmentation Model of 143 Whole-body Organs in CT Scans [31.388497540849297]
We propose a new architectural CSS learning framework to learn a single deep segmentation model for segmenting a total of 143 whole-body organs.
We trained and validated on 3D CT scans of 2500+ patients from four datasets, our single network can segment total 143 whole-body organs with very high accuracy.
arXiv Detail & Related papers (2023-02-01T00:49:21Z) - CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection [36.08551407926805]
We propose the CLIP-Driven Universal Model, which incorporates text embedding learned from Contrastive Language-Image Pre-training to segmentation models.
The proposed model is developed from an assembly of 14 datasets, using a total of 3,410 CT scans for training and then evaluated on 6,162 external CT scans from 3 additional datasets.
arXiv Detail & Related papers (2023-01-02T18:07:44Z) - Universal Segmentation of 33 Anatomies [19.194539991903593]
We present an approach for learning a single model that universally segments 33 anatomical structures.
We learn such a model from a union of multiple datasets, with each dataset containing the images that are partially labeled.
We evaluate our model on multiple open-source datasets, proving that our model has a good generalization performance.
arXiv Detail & Related papers (2022-03-04T02:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.