PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse Potential Outcomes Using Large Vision and Language Models
- URL: http://arxiv.org/abs/2407.09979v1
- Date: Sat, 13 Jul 2024 18:51:52 GMT
- Title: PFPs: Prompt-guided Flexible Pathological Segmentation for Diverse Potential Outcomes Using Large Vision and Language Models
- Authors: Can Cui, Ruining Deng, Junlin Guo, Quan Liu, Tianyuan Yao, Haichun Yang, Yuankai Huo,
- Abstract summary: We introduce various task prompts through a Large Language Model (LLM) alongside traditional task tokens to enhance segmentation flexibility.
Our contribution is in four-fold: (1) we construct a computational-efficient pipeline that uses finetuned language prompts to guide flexible multi-class segmentation; (2) We compare segmentation performance with fixed prompts against free-text; and (3) We design a multi-task kidney pathology segmentation dataset and the corresponding various free-text prompts.
- Score: 12.895542069443438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Vision Foundation Model has recently gained attention in medical image analysis. Its zero-shot learning capabilities accelerate AI deployment and enhance the generalizability of clinical applications. However, segmenting pathological images presents a special focus on the flexibility of segmentation targets. For instance, a single click on a Whole Slide Image (WSI) could signify a cell, a functional unit, or layers, adding layers of complexity to the segmentation tasks. Current models primarily predict potential outcomes but lack the flexibility needed for physician input. In this paper, we explore the potential of enhancing segmentation model flexibility by introducing various task prompts through a Large Language Model (LLM) alongside traditional task tokens. Our contribution is in four-fold: (1) we construct a computational-efficient pipeline that uses finetuned language prompts to guide flexible multi-class segmentation; (2) We compare segmentation performance with fixed prompts against free-text; (3) We design a multi-task kidney pathology segmentation dataset and the corresponding various free-text prompts; and (4) We evaluate our approach on the kidney pathology dataset, assessing its capacity to new cases during inference.
Related papers
- ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis [19.04633470168871]
Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy.
In this paper, we propose a novel Hierarchical Adaptive Taxonomy (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights.
Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, and (3) the
arXiv Detail & Related papers (2024-06-30T05:35:26Z) - A Classifier-Free Incremental Learning Framework for Scalable Medical Image Segmentation [6.591403935303867]
We introduce a novel segmentation paradigm enabling the segmentation of a variable number of classes within a single classifier-free network.
This network is trained using contrastive learning and produces discriminative feature representations that facilitate straightforward interpretation.
We demonstrate the flexibility of our method in handling varying class numbers within a unified network and its capacity for incremental learning.
arXiv Detail & Related papers (2024-05-25T19:05:07Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - VISION-MAE: A Foundation Model for Medical Image Segmentation and
Classification [36.8105960525233]
We present a novel foundation model, VISION-MAE, specifically designed for medical imaging.
VISION-MAE is trained on a dataset of 2.5 million unlabeled images from various modalities.
It is then adapted to classification and segmentation tasks using explicit labels.
arXiv Detail & Related papers (2024-02-01T21:45:12Z) - Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models [0.8878802873945023]
This study introduces the first systematic study on transferring Vision-Language Models to 2D medical images.
Although VLSMs show competitive performance compared to image-only models for segmentation, not all VLSMs utilize the additional information from language prompts.
arXiv Detail & Related papers (2023-08-15T11:28:21Z) - Towards a Visual-Language Foundation Model for Computational Pathology [5.72536252929528]
We introduce CONtrastive learning from Captions for Histopathology (CONCH)
CONCH is a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and task-agnostic pretraining.
It is evaluated on a suite of 13 diverse benchmarks, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval.
arXiv Detail & Related papers (2023-07-24T16:13:43Z) - Diffusion Models for Open-Vocabulary Segmentation [79.02153797465324]
OVDiff is a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation.
It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training.
arXiv Detail & Related papers (2023-06-15T17:51:28Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Generalized Organ Segmentation by Imitating One-shot Reasoning using
Anatomical Correlation [55.1248480381153]
We propose OrganNet which learns a generalized organ concept from a set of annotated organ classes and then transfer this concept to unseen classes.
We show that OrganNet can effectively resist the wide variations in organ morphology and produce state-of-the-art results in one-shot segmentation task.
arXiv Detail & Related papers (2021-03-30T13:41:12Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.