Towards a Visual-Language Foundation Model for Computational Pathology
- URL: http://arxiv.org/abs/2307.12914v2
- Date: Tue, 25 Jul 2023 17:56:38 GMT
- Title: Towards a Visual-Language Foundation Model for Computational Pathology
- Authors: Ming Y. Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Ivy
Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Andrew Zhang, Long Phi Le,
Georg Gerber, Anil V Parwani, Faisal Mahmood
- Abstract summary: We introduce CONtrastive learning from Captions for Histopathology (CONCH)
CONCH is a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and task-agnostic pretraining.
It is evaluated on a suite of 13 diverse benchmarks, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval.
- Score: 5.72536252929528
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The accelerated adoption of digital pathology and advances in deep learning
have enabled the development of powerful models for various pathology tasks
across a diverse array of diseases and patient cohorts. However, model training
is often difficult due to label scarcity in the medical domain and the model's
usage is limited by the specific task and disease for which it is trained.
Additionally, most models in histopathology leverage only image data, a stark
contrast to how humans teach each other and reason about histopathologic
entities. We introduce CONtrastive learning from Captions for Histopathology
(CONCH), a visual-language foundation model developed using diverse sources of
histopathology images, biomedical text, and notably over 1.17 million
image-caption pairs via task-agnostic pretraining. Evaluated on a suite of 13
diverse benchmarks, CONCH can be transferred to a wide range of downstream
tasks involving either or both histopathology images and text, achieving
state-of-the-art performance on histology image classification, segmentation,
captioning, text-to-image and image-to-text retrieval. CONCH represents a
substantial leap over concurrent visual-language pretrained systems for
histopathology, with the potential to directly facilitate a wide array of
machine learning-based workflows requiring minimal or no further supervised
fine-tuning.
Related papers
- Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - GPC: Generative and General Pathology Image Classifier [2.6954348706500766]
We propose a task-agnostic generative and general pathology image classifier, so called GPC.
GPC maps pathology images into a high-dimensional feature space and generates pertinent class labels as texts.
We evaluate GPC on six datasets for four different pathology image classification tasks.
arXiv Detail & Related papers (2024-07-12T06:54:31Z) - Towards a text-based quantitative and explainable histopathology image analysis [4.064178811354613]
We propose a Text-based Quantitative and Explainable histopathology image analysis, which we call TQx.
The retrieved words are then used to quantify the histopathology images and generate understandable feature embeddings.
The results demonstrate that TQx is able to quantify and analyze histopathology images comparable to the prevalent visual models in computational pathology.
arXiv Detail & Related papers (2024-07-10T04:33:43Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Knowledge-enhanced Visual-Language Pretraining for Computational Pathology [68.6831438330526]
We consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources.
We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring pathology diagnosis from 32 human tissues.
arXiv Detail & Related papers (2024-04-15T17:11:25Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - In-context learning enables multimodal large language models to classify
cancer pathology images [0.7085801706650957]
In language processing, in-context learning provides an alternative, where models learn from within prompts, bypassing the need for parameter updates.
Here, we systematically evaluate the model Generative Pretrained Transformer 4 with Vision capabilities (GPT-4V) on cancer image processing with in-context learning.
Our results show that in-context learning is sufficient to match or even outperform specialized neural networks trained for particular tasks, while only requiring a minimal number of samples.
arXiv Detail & Related papers (2024-03-12T08:34:34Z) - Text-guided Foundation Model Adaptation for Pathological Image
Classification [40.45252665455015]
We propose to connect image and text Embeddings (CITE) to enhance pathological image classification.
CITE injects text insights gained from language models pre-trained with a broad range of biomedical texts, leading to adapt foundation models towards pathological image understanding.
arXiv Detail & Related papers (2023-07-27T14:44:56Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Self-Supervised Vision Transformers Learn Visual Concepts in
Histopathology [5.164102666113966]
We conduct a search for good representations in pathology by training a variety of self-supervised models with validation on a variety of weakly-supervised and patch-level tasks.
Our key finding is in discovering that Vision Transformers using DINO-based knowledge distillation are able to learn data-efficient and interpretable features in histology images.
arXiv Detail & Related papers (2022-03-01T16:14:41Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.