Related papers: Masked Pre-Training of Transformers for Histology Image Analysis

Masked Pre-Training of Transformers for Histology Image Analysis

URL: http://arxiv.org/abs/2304.07434v1
Date: Fri, 14 Apr 2023 23:56:49 GMT
Title: Masked Pre-Training of Transformers for Histology Image Analysis
Authors: Shuai Jiang, Liesbeth Hondelink, Arief A. Suriawinata, Saeed Hassanpour
Abstract summary: In digital pathology, whole slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Visual transformer models have emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. We propose a pretext task for training the transformer model without labeled data to address this problem. Our model, MaskHIT, uses the transformer output to reconstruct masked patches and learn representative histological features based on their positions and visual features.
Score: 4.710921988115685
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In digital pathology, whole slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Visual transformer models have recently emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. However, due to the large number of model parameters and limited labeled data, applying transformer models to WSIs remains challenging. Inspired by masked language models, we propose a pretext task for training the transformer model without labeled data to address this problem. Our model, MaskHIT, uses the transformer output to reconstruct masked patches and learn representative histological features based on their positions and visual features. The experimental results demonstrate that MaskHIT surpasses various multiple instance learning approaches by 3% and 2% on survival prediction and cancer subtype classification tasks, respectively. Furthermore, MaskHIT also outperforms two of the most recent state-of-the-art transformer-based methods. Finally, a comparison between the attention maps generated by the MaskHIT model with pathologist's annotations indicates that the model can accurately identify clinically relevant histological structures in each task.

Related papers

PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors. Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images. Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z)
M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images [16.19308597273405]
We propose M2OST, a many-to-one regression Transformer that can accommodate the hierarchical structure of pathology images. Unlike traditional models that are trained with one-to-one image-label pairs, M2OST uses multiple images from different levels of the digital pathology image to jointly predict the gene expressions in their common corresponding spot. M2OST can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs)
arXiv Detail & Related papers (2024-09-23T15:06:37Z)
Hibou: A Family of Foundational Vision Transformers for Pathology [0.0]
Digital pathology, which converts glass slides into high-resolution digital images for analysis by computer algorithms, revolutionizes the field by enhancing diagnostic accuracy, consistency, and efficiency through automated image analysis and large-scale data processing. This paper introduces the Hibou family of foundational vision transformers for pathology, leveraging the DINOv2 framework to pretrain two model variants, Hibou-B and Hibou-L, on a proprietary dataset of over 1 million whole slide images (WSIs) representing diverse tissue types and staining techniques. Our pretrained models demonstrate superior performance on both patch-level and slide-level benchmarks, surpassing existing
arXiv Detail & Related papers (2024-06-07T16:45:53Z)
A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases [0.0]
Vision Transformers (ViT) are powerful tools due to their scalability and ability to process large amounts of data. We fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset. Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases.
arXiv Detail & Related papers (2024-05-31T23:56:42Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images. We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z)
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology. We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z)
What a Whole Slide Image Can Tell? Subtype-guided Masked Transformer for Pathological Image Captioning [6.496515352848627]
We propose a new paradigm Subtype-guided Masked Transformer (SGMT) for pathological captioning based on Transformers. An accompanying subtype prediction is introduced into SGMT to guide the training process and enhance the captioning accuracy. Experiments on the PatchGastricADC22 dataset demonstrate that our approach effectively adapts to the task with a transformer-based model.
arXiv Detail & Related papers (2023-10-31T16:43:03Z)
Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions. We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z)
On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input. DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases. We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z)
Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes. Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
Class-Aware Generative Adversarial Transformers for Medical Image Segmentation [39.14169989603906]
We present CA-GANformer, a novel type of generative adversarial transformers, for medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures.
arXiv Detail & Related papers (2022-01-26T03:50:02Z)
Evaluating Transformer based Semantic Segmentation Networks for Pathological Image Segmentation [2.7029872968576947]
Histopathology has played an essential role in cancer diagnosis. Various CNN-based automated pathological image segmentation approaches have been developed in computer-assisted pathological image analysis. Transformer neural networks (Transformer) have shown the unique merit of capturing the global long distance dependencies across the entire image as a new deep learning paradigm.
arXiv Detail & Related papers (2021-08-26T18:46:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.