Related papers: Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification

Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification

URL: http://arxiv.org/abs/2306.05029v2
Date: Tue, 5 Sep 2023 09:43:02 GMT
Title: Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification
Authors: Ruijie Zhang, Qiaozhe Zhang, Yingzhuang Liu, Hao Xin, Yan Liu, Xinggang Wang
Abstract summary: Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD) We propose a Multi-level MIL (MMIL) scheme by introducing a hierarchical structure to MIL, which enables efficient handling of MIL tasks involving a large number of instances. Based on MMIL, we instantiated MMIL-Transformer, an efficient Transformer model with windowed exact self-attention for large-scale MIL tasks.
Score: 32.43847786719133
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD). The extremely high resolution and limited availability of region-level annotations make employing deep learning methods for WSI-based digital diagnosis challenging. Recently integrating multiple instance learning (MIL) and Transformer for WSI analysis shows very promising results. However, designing effective Transformers for this weakly-supervised high-resolution image analysis is an underexplored yet important problem. In this paper, we propose a Multi-level MIL (MMIL) scheme by introducing a hierarchical structure to MIL, which enables efficient handling of MIL tasks involving a large number of instances. Based on MMIL, we instantiated MMIL-Transformer, an efficient Transformer model with windowed exact self-attention for large-scale MIL tasks. To validate its effectiveness, we conducted a set of experiments on WSI classification tasks, where MMIL-Transformer demonstrate superior performance compared to existing state-of-the-art methods, i.e., 96.80% test AUC and 97.67% test accuracy on the CAMELYON16 dataset, 99.04% test AUC and 94.37% test accuracy on the TCGA-NSCLC dataset, respectively. All code and pre-trained models are available at: https://github.com/hustvl/MMIL-Transformer

Related papers

Reducing Variability of Multiple Instance Learning Methods for Digital Pathology [2.9284034606635267]
Digital pathology has revolutionized the field by enabling the digitization of tissue samples into whole slide images (WSIs)<n>WSIs are often divided into smaller patches with a global label.<n>MIL methods have emerged as a suitable solution for WSI classification.
arXiv Detail & Related papers (2025-06-30T22:10:24Z)
Multi-head Attention-based Deep Multiple Instance Learning [1.0389304366020162]
MAD-MIL is a Multi-head Attention-based Deep Multiple Instance Learning model. It is designed for weakly supervised Whole Slide Images (WSIs) classification in digital pathology. evaluated on the MNIST-BAGS and public datasets, including TUPAC16, TCGA BRCA, TCGA LUNG, and TCGA KIDNEY.
arXiv Detail & Related papers (2024-04-08T09:54:28Z)
MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis. We represent each WSI as an undirected graph. To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z)
Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions. We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z)
TPMIL: Trainable Prototype Enhanced Multiple Instance Learning for Whole Slide Image Classification [13.195971707693365]
We develop a Trainable Prototype enhanced deep MIL framework for weakly supervised WSI classification. Our method is able to reveal the correlations between different tumor subtypes through distances between corresponding trained prototypes. We test our method on two WSI datasets and it achieves a new SOTA.
arXiv Detail & Related papers (2023-05-01T07:39:19Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes. Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
Differentiable Zooming for Multiple Instance Learning on Whole-Slide Images [4.928363812223965]
We propose ZoomMIL, a method that learns to perform multi-level zooming in an end-to-end manner. The proposed method outperforms the state-of-the-art MIL methods in WSI classification on two large datasets.
arXiv Detail & Related papers (2022-04-26T17:20:50Z)
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE. ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context. We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z)
TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication [38.58585442160062]
Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. We proposed a new framework, called correlated MIL, and provided a proof for convergence. We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-02T02:57:54Z)
Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning [16.84711797934138]
We address the challenging problem of whole slide image (WSI) classification. WSI classification can be cast as a multiple instance learning (MIL) problem when only slide-level labels are available. We propose a MIL-based method for WSI classification and tumor detection that does not require localized annotations.
arXiv Detail & Related papers (2020-11-17T20:51:15Z)
Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels. In this paper, we explore how to apply mixup to natural language processing tasks. We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.