Related papers: Automatic Report Generation for Histopathology images using pre-trained Vision Transformers

Automatic Report Generation for Histopathology images using pre-trained Vision Transformers

URL: http://arxiv.org/abs/2311.06176v2
Date: Mon, 13 Nov 2023 23:49:35 GMT
Title: Automatic Report Generation for Histopathology images using pre-trained Vision Transformers
Authors: Saurav Sengupta, Donald E. Brown
Abstract summary: We show that using an existing pre-trained Vision Transformer in a two-step process of first using it to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and then using it as the encoder and an LSTM decoder for report generation. We are also able to use representations from an existing powerful pre-trained hierarchical vision transformer and show its usefulness in not just zero shot classification but also for report generation.
Score: 1.2781698000674653
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning for histopathology has been successfully used for disease classification, image segmentation and more. However, combining image and text modalities using current state-of-the-art methods has been a challenge due to the high resolution of histopathology images. Automatic report generation for histopathology images is one such challenge. In this work, we show that using an existing pre-trained Vision Transformer in a two-step process of first using it to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and then using it as the encoder and an LSTM decoder for report generation, we can build a fairly performant and portable report generation mechanism that takes into account the whole of the high resolution image, instead of just the patches. We are also able to use representations from an existing powerful pre-trained hierarchical vision transformer and show its usefulness in not just zero shot classification but also for report generation.

Related papers

Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL) Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z)
Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT [1.0819408603463427]
We show that using an existing pre-trained Vision Transformer (ViT) to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and a pre-trained Bidirectional Representations from Transformers (BERT) model for report generation, we can build a performant and portable report generation mechanism. Our method allows us to not only generate and evaluate captions that describe the image, but also helps us classify the image into tissue types and the gender of the patient as well.
arXiv Detail & Related papers (2023-12-03T15:56:09Z)
Representation Learning for Non-Melanoma Skin Cancer using a Latent Autoencoder [0.0]
Generative learning is a powerful tool for representation learning, and shows particular promise for problems in biomedical imaging. It remains difficult to faithfully reconstruct images from generative models, particularly those as complex as histological images. In this work, two existing methods (autoencoders and latent autoencoders) are combined in attempt to improve our ability to encode and decode real images of non-melanoma skin cancer.
arXiv Detail & Related papers (2022-09-05T06:24:58Z)
Conviformers: Convolutionally guided Vision Transformer [5.964436882344729]
We present an in-depth analysis and describe the critical components for developing a system for the fine-grained categorization of plants from herbarium sheets. We introduce a convolutional transformer architecture called Conviformer which, unlike the popular Vision Transformer (ConViT), can handle higher resolution images without exploding memory and computational cost. With our simple yet effective approach, we achieved SoTA on Herbarium 202x and iNaturalist 2019 dataset.
arXiv Detail & Related papers (2022-08-17T13:09:24Z)
Understanding Transfer Learning for Chest Radiograph Clinical Report Generation with Modified Transformer Architectures [0.0]
We train a series of modified transformers to generate clinical reports from chest radiograph image input. We use BLEU(1-4), ROUGE-L, CIDEr, and the clinical CheXbert F1 scores to validate our models and demonstrate competitive scores with state of the art models.
arXiv Detail & Related papers (2022-05-05T03:08:05Z)
Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology [5.164102666113966]
We conduct a search for good representations in pathology by training a variety of self-supervised models with validation on a variety of weakly-supervised and patch-level tasks. Our key finding is in discovering that Vision Transformers using DINO-based knowledge distillation are able to learn data-efficient and interpretable features in histology images.
arXiv Detail & Related papers (2022-03-01T16:14:41Z)
MaskGIT: Masked Generative Image Transformer [49.074967597485475]
MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. Experiments demonstrate that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset.
arXiv Detail & Related papers (2022-02-08T23:54:06Z)
Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation. tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z)
Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)
Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks. We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems. On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard. We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.