Multi-Scale Prototypical Transformer for Whole Slide Image
Classification
- URL: http://arxiv.org/abs/2307.02308v1
- Date: Wed, 5 Jul 2023 14:10:29 GMT
- Title: Multi-Scale Prototypical Transformer for Whole Slide Image
Classification
- Authors: Saisai Ding, Jun Wang, Juncheng Li, and Jun Shi
- Abstract summary: Whole slide image (WSI) classification is an essential task in computational pathology.
We propose a novel multi-scale prototypical Transformer (MSPT) for WSI classification.
- Score: 12.584411225450989
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Whole slide image (WSI) classification is an essential task in computational
pathology. Despite the recent advances in multiple instance learning (MIL) for
WSI classification, accurate classification of WSIs remains challenging due to
the extreme imbalance between the positive and negative instances in bags, and
the complicated pre-processing to fuse multi-scale information of WSI. To this
end, we propose a novel multi-scale prototypical Transformer (MSPT) for WSI
classification, which includes a prototypical Transformer (PT) module and a
multi-scale feature fusion module (MFFM). The PT is developed to reduce
redundant instances in bags by integrating prototypical learning into the
Transformer architecture. It substitutes all instances with cluster prototypes,
which are then re-calibrated through the self-attention mechanism of the
Trans-former. Thereafter, an MFFM is proposed to fuse the clustered prototypes
of different scales, which employs MLP-Mixer to enhance the information
communication between prototypes. The experimental results on two public WSI
datasets demonstrate that the proposed MSPT outperforms all the compared
algorithms, suggesting its potential applications.
Related papers
- UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens.
Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z) - Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD)
By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder.
MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z) - A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images.
We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - TPMIL: Trainable Prototype Enhanced Multiple Instance Learning for Whole
Slide Image Classification [13.195971707693365]
We develop a Trainable Prototype enhanced deep MIL framework for weakly supervised WSI classification.
Our method is able to reveal the correlations between different tumor subtypes through distances between corresponding trained prototypes.
We test our method on two WSI datasets and it achieves a new SOTA.
arXiv Detail & Related papers (2023-05-01T07:39:19Z) - Diagnose Like a Pathologist: Transformer-Enabled Hierarchical
Attention-Guided Multiple Instance Learning for Whole Slide Image
Classification [39.41442041007595]
Multiple Instance Learning and transformers are increasingly popular in histopathology Whole Slide Image (WSI) classification.
We propose a Hierarchical Attention-Guided Multiple Instance Learning framework to fully exploit the WSIs.
Within this framework, an Integrated Attention Transformer is proposed to further enhance the performance of the transformer.
arXiv Detail & Related papers (2023-01-19T15:38:43Z) - Language models are good pathologists: using attention-based sequence
reduction and text-pretrained transformers for efficient WSI classification [0.21756081703275998]
Whole Slide Image (WSI) analysis is usually formulated as a Multiple Instance Learning (MIL) problem.
We introduce textitSeqShort, a sequence shortening layer to summarize each WSI in a fixed- and short-sized sequence of instances.
We show that WSI classification performance can be improved when the downstream transformer architecture has been pre-trained on a large corpus of text data.
arXiv Detail & Related papers (2022-11-14T14:11:31Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - Multimodal Fusion Transformer for Remote Sensing Image Classification [35.57881383390397]
Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs)
To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters.
We introduce a new multimodal fusion transformer (MFT) network which comprises a multihead cross patch attention (mCrossPA) for HSI land-cover classification.
arXiv Detail & Related papers (2022-03-31T11:18:41Z) - Point Cloud Learning with Transformer [2.3204178451683264]
We introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT)
Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales.
A multi-level transformer module is designed to aggregate contextual information from different levels of each scale and enhance their interactions.
arXiv Detail & Related papers (2021-04-28T08:39:21Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.