Multi-level Multiple Instance Learning with Transformer for Whole Slide
Image Classification
- URL: http://arxiv.org/abs/2306.05029v2
- Date: Tue, 5 Sep 2023 09:43:02 GMT
- Title: Multi-level Multiple Instance Learning with Transformer for Whole Slide
Image Classification
- Authors: Ruijie Zhang, Qiaozhe Zhang, Yingzhuang Liu, Hao Xin, Yan Liu,
Xinggang Wang
- Abstract summary: Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD)
We propose a Multi-level MIL (MMIL) scheme by introducing a hierarchical structure to MIL, which enables efficient handling of MIL tasks involving a large number of instances.
Based on MMIL, we instantiated MMIL-Transformer, an efficient Transformer model with windowed exact self-attention for large-scale MIL tasks.
- Score: 32.43847786719133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whole slide image (WSI) refers to a type of high-resolution scanned tissue
image, which is extensively employed in computer-assisted diagnosis (CAD). The
extremely high resolution and limited availability of region-level annotations
make employing deep learning methods for WSI-based digital diagnosis
challenging. Recently integrating multiple instance learning (MIL) and
Transformer for WSI analysis shows very promising results. However, designing
effective Transformers for this weakly-supervised high-resolution image
analysis is an underexplored yet important problem. In this paper, we propose a
Multi-level MIL (MMIL) scheme by introducing a hierarchical structure to MIL,
which enables efficient handling of MIL tasks involving a large number of
instances. Based on MMIL, we instantiated MMIL-Transformer, an efficient
Transformer model with windowed exact self-attention for large-scale MIL tasks.
To validate its effectiveness, we conducted a set of experiments on WSI
classification tasks, where MMIL-Transformer demonstrate superior performance
compared to existing state-of-the-art methods, i.e., 96.80% test AUC and 97.67%
test accuracy on the CAMELYON16 dataset, 99.04% test AUC and 94.37% test
accuracy on the TCGA-NSCLC dataset, respectively. All code and pre-trained
models are available at: https://github.com/hustvl/MMIL-Transformer
Related papers
- Multi-head Attention-based Deep Multiple Instance Learning [1.0389304366020162]
MAD-MIL is a Multi-head Attention-based Deep Multiple Instance Learning model.
It is designed for weakly supervised Whole Slide Images (WSIs) classification in digital pathology.
evaluated on the MNIST-BAGS and public datasets, including TUPAC16, TCGA BRCA, TCGA LUNG, and TCGA KIDNEY.
arXiv Detail & Related papers (2024-04-08T09:54:28Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - TPMIL: Trainable Prototype Enhanced Multiple Instance Learning for Whole
Slide Image Classification [13.195971707693365]
We develop a Trainable Prototype enhanced deep MIL framework for weakly supervised WSI classification.
Our method is able to reveal the correlations between different tumor subtypes through distances between corresponding trained prototypes.
We test our method on two WSI datasets and it achieves a new SOTA.
arXiv Detail & Related papers (2023-05-01T07:39:19Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - Differentiable Zooming for Multiple Instance Learning on Whole-Slide
Images [4.928363812223965]
We propose ZoomMIL, a method that learns to perform multi-level zooming in an end-to-end manner.
The proposed method outperforms the state-of-the-art MIL methods in WSI classification on two large datasets.
arXiv Detail & Related papers (2022-04-26T17:20:50Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - TransMIL: Transformer based Correlated Multiple Instance Learning for
Whole Slide Image Classication [38.58585442160062]
Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis.
We proposed a new framework, called correlated MIL, and provided a proof for convergence.
We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-02T02:57:54Z) - Dual-stream Multiple Instance Learning Network for Whole Slide Image
Classification with Self-supervised Contrastive Learning [16.84711797934138]
We address the challenging problem of whole slide image (WSI) classification.
WSI classification can be cast as a multiple instance learning (MIL) problem when only slide-level labels are available.
We propose a MIL-based method for WSI classification and tumor detection that does not require localized annotations.
arXiv Detail & Related papers (2020-11-17T20:51:15Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.