PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
- URL: http://arxiv.org/abs/2405.10254v2
- Date: Wed, 22 May 2024 17:22:32 GMT
- Title: PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
- Authors: George Shaikovski, Adam Casson, Kristen Severson, Eric Zimmermann, Yi Kan Wang, Jeremy D. Kunz, Juan A. Retamero, Gerard Oakley, David Klimstra, Christopher Kanan, Matthew Hanna, Michal Zelechowski, Julian Viret, Neil Tenenholtz, James Hall, Nicolo Fusi, Razik Yousfi, Peter Hamilton, William A. Moye, Eugene Vorontsov, Siqi Liu, Thomas J. Fuchs,
- Abstract summary: We present a slide-level foundation model for H&E-stained histopathology, PRISM, that builds on Virchow tile embeddings.
PRISM produces slide-level embeddings with the ability to generate clinical reports, resulting in several modes of use.
Using text prompts, PRISM achieves zero-shot cancer detection and sub-typing performance approaching that of a supervised aggregator model.
- Score: 9.556246087301883
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Foundation models in computational pathology promise to unlock the development of new clinical decision support systems and models for precision medicine. However, there is a mismatch between most clinical analysis, which is defined at the level of one or more whole slide images, and foundation models to date, which process the thousands of image tiles contained in a whole slide image separately. The requirement to train a network to aggregate information across a large number of tiles in multiple whole slide images limits these models' impact. In this work, we present a slide-level foundation model for H&E-stained histopathology, PRISM, that builds on Virchow tile embeddings and leverages clinical report text for pre-training. Using the tile embeddings, PRISM produces slide-level embeddings with the ability to generate clinical reports, resulting in several modes of use. Using text prompts, PRISM achieves zero-shot cancer detection and sub-typing performance approaching and surpassing that of a supervised aggregator model. Using the slide embeddings with linear classifiers, PRISM surpasses supervised aggregator models. Furthermore, we demonstrate that fine-tuning of the PRISM slide encoder yields label-efficient training for biomarker prediction, a task that typically suffers from low availability of training data; an aggregator initialized with PRISM and trained on as little as 10% of the training data can outperform a supervised baseline that uses all of the data.
Related papers
- Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective [32.93871326428446]
Recent advances in artificial intelligence (AI) are revolutionizing medical imaging and computational pathology.
A constant challenge in the analysis of digital Whole Slide Images (WSIs) is the problem of aggregating tens of thousands of tile-level image embeddings to a slide-level representation.
This study conducts a benchmarking analysis of ten slide-level aggregation techniques across nine clinically relevant tasks.
arXiv Detail & Related papers (2024-07-10T17:00:57Z) - RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports [19.915033191502328]
The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing.
To handle this issue, a CLIP-style retinal image foundation model is developed in this paper.
Our foundation model, RET-CLIP, is specifically trained on a dataset of 193,865 patients to extract general features of color fundus photographs.
arXiv Detail & Related papers (2024-05-23T03:20:51Z) - A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images.
We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - MedFMC: A Real-world Dataset and Benchmark For Foundation Model
Adaptation in Medical Image Classification [41.16626194300303]
Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications.
Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples.
Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks.
arXiv Detail & Related papers (2023-06-16T01:46:07Z) - DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources.
Most of the technical contributions aim at accelerating and stabilizing the training at scale.
In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Domain-Specific Pre-training Improves Confidence in Whole Slide Image
Classification [15.354256205808273]
Whole Slide Images (WSIs) or histopathology images are used in digital pathology.
WSIs pose great challenges to deep learning models for clinical diagnosis.
arXiv Detail & Related papers (2023-02-20T08:42:06Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Unsupervised pre-training of graph transformers on patient population
graphs [48.02011627390706]
We propose a graph-transformer-based network to handle heterogeneous clinical data.
We show the benefit of our pre-training method in a self-supervised and a transfer learning setting.
arXiv Detail & Related papers (2022-07-21T16:59:09Z) - Personalizing Pre-trained Models [23.145974171912414]
We consider how upstream pretrained models can be leveraged for downstream few-shot, multilabel, and continual learning tasks.
Our model CLIPPER (CLIP PERsonalized) uses image representations from CLIP, a large-scale image representation learning model trained using weak natural language supervision.
arXiv Detail & Related papers (2021-06-02T22:58:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.