Diagnose Like a Pathologist: Transformer-Enabled Hierarchical
Attention-Guided Multiple Instance Learning for Whole Slide Image
Classification
- URL: http://arxiv.org/abs/2301.08125v2
- Date: Mon, 17 Jul 2023 03:00:47 GMT
- Title: Diagnose Like a Pathologist: Transformer-Enabled Hierarchical
Attention-Guided Multiple Instance Learning for Whole Slide Image
Classification
- Authors: Conghao Xiong, Hao Chen, Joseph J.Y. Sung, Irwin King
- Abstract summary: Multiple Instance Learning and transformers are increasingly popular in histopathology Whole Slide Image (WSI) classification.
We propose a Hierarchical Attention-Guided Multiple Instance Learning framework to fully exploit the WSIs.
Within this framework, an Integrated Attention Transformer is proposed to further enhance the performance of the transformer.
- Score: 39.41442041007595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiple Instance Learning (MIL) and transformers are increasingly popular in
histopathology Whole Slide Image (WSI) classification. However, unlike human
pathologists who selectively observe specific regions of histopathology tissues
under different magnifications, most methods do not incorporate multiple
resolutions of the WSIs, hierarchically and attentively, thereby leading to a
loss of focus on the WSIs and information from other resolutions. To resolve
this issue, we propose a Hierarchical Attention-Guided Multiple Instance
Learning framework to fully exploit the WSIs. This framework can dynamically
and attentively discover the discriminative regions across multiple resolutions
of the WSIs. Within this framework, an Integrated Attention Transformer is
proposed to further enhance the performance of the transformer and obtain a
more holistic WSI (bag) representation. This transformer consists of multiple
Integrated Attention Modules, which is the combination of a transformer layer
and an aggregation module that produces a bag representation based on every
instance representation in that bag. The experimental results show that our
method achieved state-of-the-art performances on multiple datasets, including
Camelyon16, TCGA-RCC, TCGA-NSCLC, and an in-house IMGC dataset. The code is
available at https://github.com/BearCleverProud/HAG-MIL.
Related papers
- RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification [10.365234803533982]
We propose a retentive MIL method called RetMIL, which processes WSI sequences through hierarchical feature propagation structure.
At the local level, the WSI sequence is divided into multiple subsequences. Tokens of each subsequence are updated through a parallel linear retention mechanism.
At the global level, subsequences are fused into a global sequence, then updated through a serial retention mechanism, and finally the slide-level representation is obtained through a global attention pooling.
arXiv Detail & Related papers (2024-03-16T08:50:47Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Multi-Scale Prototypical Transformer for Whole Slide Image
Classification [12.584411225450989]
Whole slide image (WSI) classification is an essential task in computational pathology.
We propose a novel multi-scale prototypical Transformer (MSPT) for WSI classification.
arXiv Detail & Related papers (2023-07-05T14:10:29Z) - DCN-T: Dual Context Network with Transformer for Hyperspectral Image
Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.
We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images.
Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z) - BEL: A Bag Embedding Loss for Transformer enhances Multiple Instance
Whole Slide Image Classification [39.53132774980783]
Bag Embedding Loss (BEL) forces the model to learn a discriminative bag-level representation by minimizing the distance between bag embeddings of the same class and maximizing the distance between different classes.
We show that with BEL, TransMIL outperforms the baseline models on both datasets.
arXiv Detail & Related papers (2023-03-02T16:02:55Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - UniNet: Unified Architecture Search with Convolution, Transformer, and
MLP [62.401161377258234]
In this paper, we propose to jointly search the optimal combination of convolution, transformer, and COCO for building a series of all-operator network architectures.
We identify that the widely-used strided convolution or pooling based down-sampling modules become the performance bottlenecks when operators are combined to form a network.
To better tackle the global context captured by the transformer and operators, we propose two novel context-aware down-sampling modules.
arXiv Detail & Related papers (2021-10-08T11:09:40Z) - Semantic Attention and Scale Complementary Network for Instance
Segmentation in Remote Sensing Images [54.08240004593062]
We propose an end-to-end multi-category instance segmentation model, which consists of a Semantic Attention (SEA) module and a Scale Complementary Mask Branch (SCMB)
SEA module contains a simple fully convolutional semantic segmentation branch with extra supervision to strengthen the activation of interest instances on the feature map.
SCMB extends the original single mask branch to trident mask branches and introduces complementary mask supervision at different scales.
arXiv Detail & Related papers (2021-07-25T08:53:59Z) - A Universal Representation Transformer Layer for Few-Shot Image
Classification [43.31379752656756]
Few-shot classification aims to recognize unseen classes when presented with only a small number of samples.
We consider the problem of multi-domain few-shot image classification, where unseen classes and examples come from diverse data sources.
Here, we propose a Universal Representation Transformer layer, that meta-learns to leverage universal features for few-shot classification.
arXiv Detail & Related papers (2020-06-21T03:08:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.