Masked Pre-Training of Transformers for Histology Image Analysis
- URL: http://arxiv.org/abs/2304.07434v1
- Date: Fri, 14 Apr 2023 23:56:49 GMT
- Title: Masked Pre-Training of Transformers for Histology Image Analysis
- Authors: Shuai Jiang, Liesbeth Hondelink, Arief A. Suriawinata, Saeed
Hassanpour
- Abstract summary: In digital pathology, whole slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction.
Visual transformer models have emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches.
We propose a pretext task for training the transformer model without labeled data to address this problem.
Our model, MaskHIT, uses the transformer output to reconstruct masked patches and learn representative histological features based on their positions and visual features.
- Score: 4.710921988115685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In digital pathology, whole slide images (WSIs) are widely used for
applications such as cancer diagnosis and prognosis prediction. Visual
transformer models have recently emerged as a promising method for encoding
large regions of WSIs while preserving spatial relationships among patches.
However, due to the large number of model parameters and limited labeled data,
applying transformer models to WSIs remains challenging. Inspired by masked
language models, we propose a pretext task for training the transformer model
without labeled data to address this problem. Our model, MaskHIT, uses the
transformer output to reconstruct masked patches and learn representative
histological features based on their positions and visual features. The
experimental results demonstrate that MaskHIT surpasses various multiple
instance learning approaches by 3% and 2% on survival prediction and cancer
subtype classification tasks, respectively. Furthermore, MaskHIT also
outperforms two of the most recent state-of-the-art transformer-based methods.
Finally, a comparison between the attention maps generated by the MaskHIT model
with pathologist's annotations indicates that the model can accurately identify
clinically relevant histological structures in each task.
Related papers
- Hibou: A Family of Foundational Vision Transformers for Pathology [0.0]
Digital pathology, which converts glass slides into high-resolution digital images for analysis by computer algorithms, revolutionizes the field by enhancing diagnostic accuracy, consistency, and efficiency through automated image analysis and large-scale data processing.
This paper introduces the Hibou family of foundational vision transformers for pathology, leveraging the DINOv2 framework to pretrain two model variants, Hibou-B and Hibou-L, on a proprietary dataset of over 1 million whole slide images (WSIs) representing diverse tissue types and staining techniques.
Our pretrained models demonstrate superior performance on both patch-level and slide-level benchmarks, surpassing existing
arXiv Detail & Related papers (2024-06-07T16:45:53Z) - A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases [0.0]
Vision Transformers (ViT) are powerful tools due to their scalability and ability to process large amounts of data.
We fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset.
Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases.
arXiv Detail & Related papers (2024-05-31T23:56:42Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images.
We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z) - GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable.
Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology.
We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z) - What a Whole Slide Image Can Tell? Subtype-guided Masked Transformer for
Pathological Image Captioning [6.496515352848627]
We propose a new paradigm Subtype-guided Masked Transformer (SGMT) for pathological captioning based on Transformers.
An accompanying subtype prediction is introduced into SGMT to guide the training process and enhance the captioning accuracy.
Experiments on the PatchGastricADC22 dataset demonstrate that our approach effectively adapts to the task with a transformer-based model.
arXiv Detail & Related papers (2023-10-31T16:43:03Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - Class-Aware Generative Adversarial Transformers for Medical Image
Segmentation [39.14169989603906]
We present CA-GANformer, a novel type of generative adversarial transformers, for medical image segmentation.
First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations.
We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures.
arXiv Detail & Related papers (2022-01-26T03:50:02Z) - Evaluating Transformer based Semantic Segmentation Networks for
Pathological Image Segmentation [2.7029872968576947]
Histopathology has played an essential role in cancer diagnosis.
Various CNN-based automated pathological image segmentation approaches have been developed in computer-assisted pathological image analysis.
Transformer neural networks (Transformer) have shown the unique merit of capturing the global long distance dependencies across the entire image as a new deep learning paradigm.
arXiv Detail & Related papers (2021-08-26T18:46:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.