Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT
- URL: http://arxiv.org/abs/2312.01435v2
- Date: Fri, 15 Mar 2024 12:39:24 GMT
- Title: Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT
- Authors: Saurav Sengupta, Donald E. Brown,
- Abstract summary: We show that using an existing pre-trained Vision Transformer (ViT) to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and a pre-trained Bidirectional Representations from Transformers (BERT) model for report generation, we can build a performant and portable report generation mechanism.
Our method allows us to not only generate and evaluate captions that describe the image, but also helps us classify the image into tissue types and the gender of the patient as well.
- Score: 1.0819408603463427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning for histopathology has been successfully used for disease classification, image segmentation and more. However, combining image and text modalities using current state-of-the-art (SOTA) methods has been a challenge due to the high resolution of histopathology images. Automatic report generation for histopathology images is one such challenge. In this work, we show that using an existing pre-trained Vision Transformer (ViT) to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for language modeling-based decoder for report generation, we can build a performant and portable report generation mechanism that takes into account the whole high resolution image. Our method allows us to not only generate and evaluate captions that describe the image, but also helps us classify the image into tissue types and the gender of the patient as well. Our best performing model achieves a 89.52% accuracy in Tissue Type classification with a BLEU-4 score of 0.12 in our caption generation task.
Related papers
- Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary Study [0.0]
The sensitivity and specificity were 100% and 96.4%, respectively, for automated benign and malignant case classification.
The grammar and style of the generated texts were confirmed as correct and in better agreement with gold standard.
arXiv Detail & Related papers (2024-03-26T23:32:29Z) - WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images [5.960501267687475]
We investigate how to generate pathology reports given whole slide images.
We curated the largest WSI-text dataset (PathText)
On the model end, we propose the multiple instance generative model (MI-Gen)
arXiv Detail & Related papers (2023-11-27T05:05:41Z) - Automatic Report Generation for Histopathology images using pre-trained
Vision Transformers [1.2781698000674653]
We show that using an existing pre-trained Vision Transformer in a two-step process of first using it to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and then using it as the encoder and an LSTM decoder for report generation.
We are also able to use representations from an existing powerful pre-trained hierarchical vision transformer and show its usefulness in not just zero shot classification but also for report generation.
arXiv Detail & Related papers (2023-11-10T16:48:24Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Cross-modulated Few-shot Image Generation for Colorectal Tissue
Classification [58.147396879490124]
Our few-shot generation method, named XM-GAN, takes one base and a pair of reference tissue images as input and generates high-quality yet diverse images.
To the best of our knowledge, we are the first to investigate few-shot generation in colorectal tissue images.
arXiv Detail & Related papers (2023-04-04T17:50:30Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - StraIT: Non-autoregressive Generation with Stratified Image Transformer [63.158996766036736]
Stratified Image Transformer(StraIT) is a pure non-autoregressive(NAR) generative model.
Our experiments demonstrate that StraIT significantly improves NAR generation and out-performs existing DMs and AR methods.
arXiv Detail & Related papers (2023-03-01T18:59:33Z) - DEPAS: De-novo Pathology Semantic Masks using a Generative Model [0.0]
We introduce a scalable generative model, coined as DEPAS, that captures tissue structure and generates high-resolution semantic masks with state-of-the-art quality.
We demonstrate the ability of DEPAS to generate realistic semantic maps of tissue for three types of organs: skin, prostate, and lung.
arXiv Detail & Related papers (2023-02-13T16:48:33Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Self-Supervised Vision Transformers Learn Visual Concepts in
Histopathology [5.164102666113966]
We conduct a search for good representations in pathology by training a variety of self-supervised models with validation on a variety of weakly-supervised and patch-level tasks.
Our key finding is in discovering that Vision Transformers using DINO-based knowledge distillation are able to learn data-efficient and interpretable features in histology images.
arXiv Detail & Related papers (2022-03-01T16:14:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.