RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification
- URL: http://arxiv.org/abs/2403.10858v1
- Date: Sat, 16 Mar 2024 08:50:47 GMT
- Title: RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification
- Authors: Hongbo Chu, Qiehe Sun, Jiawen Li, Yuxuan Chen, Lizhong Zhang, Tian Guan, Anjia Han, Yonghong He,
- Abstract summary: We propose a retentive MIL method called RetMIL, which processes WSI sequences through hierarchical feature propagation structure.
At the local level, the WSI sequence is divided into multiple subsequences. Tokens of each subsequence are updated through a parallel linear retention mechanism.
At the global level, subsequences are fused into a global sequence, then updated through a serial retention mechanism, and finally the slide-level representation is obtained through a global attention pooling.
- Score: 10.365234803533982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Histopathological whole slide image (WSI) analysis with deep learning has become a research focus in computational pathology. The current paradigm is mainly based on multiple instance learning (MIL), in which approaches with Transformer as the backbone are well discussed. These methods convert WSI tasks into sequence tasks by representing patches as tokens in the WSI sequence. However, the feature complexity brought by high heterogeneity and the ultra-long sequences brought by gigapixel size makes Transformer-based MIL suffer from the challenges of high memory consumption, slow inference speed, and lack of performance. To this end, we propose a retentive MIL method called RetMIL, which processes WSI sequences through hierarchical feature propagation structure. At the local level, the WSI sequence is divided into multiple subsequences. Tokens of each subsequence are updated through a parallel linear retention mechanism and aggregated utilizing an attention layer. At the global level, subsequences are fused into a global sequence, then updated through a serial retention mechanism, and finally the slide-level representation is obtained through a global attention pooling. We conduct experiments on two public CAMELYON and BRACS datasets and an public-internal LUNG dataset, confirming that RetMIL not only achieves state-of-the-art performance but also significantly reduces computational overhead. Our code will be accessed shortly.
Related papers
- Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis [9.090504201460817]
Histo Whole Slide Image (WSI) analysis serves as the gold standard for clinical cancer diagnosis in the daily routines of doctors.
Previous methods typically employ Multi-pathology Learning to enable slide-level prediction given only slide-level labels.
To alleviate the computational complexity of long sequences in large WSIs, methods like HIPT use region-slicing, and TransMIL employs approximation of full self-attention.
arXiv Detail & Related papers (2024-10-18T06:12:36Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - Quick Adaptive Ternary Segmentation: An Efficient Decoding Procedure For
Hidden Markov Models [70.26374282390401]
Decoding the original signal (i.e., hidden chain) from the noisy observations is one of the main goals in nearly all HMM based data analyses.
We present Quick Adaptive Ternary (QATS), a divide-and-conquer procedure which decodes the hidden sequence in polylogarithmic computational complexity.
arXiv Detail & Related papers (2023-05-29T19:37:48Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Diagnose Like a Pathologist: Transformer-Enabled Hierarchical
Attention-Guided Multiple Instance Learning for Whole Slide Image
Classification [39.41442041007595]
Multiple Instance Learning and transformers are increasingly popular in histopathology Whole Slide Image (WSI) classification.
We propose a Hierarchical Attention-Guided Multiple Instance Learning framework to fully exploit the WSIs.
Within this framework, an Integrated Attention Transformer is proposed to further enhance the performance of the transformer.
arXiv Detail & Related papers (2023-01-19T15:38:43Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - Language models are good pathologists: using attention-based sequence
reduction and text-pretrained transformers for efficient WSI classification [0.21756081703275998]
Whole Slide Image (WSI) analysis is usually formulated as a Multiple Instance Learning (MIL) problem.
We introduce textitSeqShort, a sequence shortening layer to summarize each WSI in a fixed- and short-sized sequence of instances.
We show that WSI classification performance can be improved when the downstream transformer architecture has been pre-trained on a large corpus of text data.
arXiv Detail & Related papers (2022-11-14T14:11:31Z) - Differentiable Zooming for Multiple Instance Learning on Whole-Slide
Images [4.928363812223965]
We propose ZoomMIL, a method that learns to perform multi-level zooming in an end-to-end manner.
The proposed method outperforms the state-of-the-art MIL methods in WSI classification on two large datasets.
arXiv Detail & Related papers (2022-04-26T17:20:50Z) - Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction [138.04956118993934]
We propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST)
CST embedding HSI sparsity into deep learning for HSI reconstruction.
In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing.
arXiv Detail & Related papers (2022-03-09T16:17:47Z) - Cluster-Former: Clustering-based Sparse Transformer for Long-Range
Dependency Encoding [90.77031668988661]
Cluster-Former is a novel clustering-based sparse Transformer to perform attention across chunked sequences.
The proposed framework is pivoted on two unique types of Transformer layer: Sliding-Window Layer and Cluster-Former Layer.
Experiments show that Cluster-Former achieves state-of-the-art performance on several major QA benchmarks.
arXiv Detail & Related papers (2020-09-13T22:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.