Related papers: Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification

Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification

URL: http://arxiv.org/abs/2211.07384v2
Date: Sat, 30 Sep 2023 21:26:44 GMT
Title: Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Authors: Juan I. Pisula and Katarzyna Bozek
Abstract summary: Whole Slide Image (WSI) analysis is usually formulated as a Multiple Instance Learning (MIL) problem. We introduce textitSeqShort, a sequence shortening layer to summarize each WSI in a fixed- and short-sized sequence of instances. We show that WSI classification performance can be improved when the downstream transformer architecture has been pre-trained on a large corpus of text data.
Score: 0.21756081703275998
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In digital pathology, Whole Slide Image (WSI) analysis is usually formulated as a Multiple Instance Learning (MIL) problem. Although transformer-based architectures have been used for WSI classification, these methods require modifications to adapt them to specific challenges of this type of image data. Among these challenges is the amount of memory and compute required by deep transformer models to process long inputs, such as the thousands of image patches that can compose a WSI at $\times 10$ or $\times 20$ magnification. We introduce \textit{SeqShort}, a multi-head attention-based sequence shortening layer to summarize each WSI in a fixed- and short-sized sequence of instances, that allows us to reduce the computational costs of self-attention on long sequences, and to include positional information that is unavailable in other MIL approaches. Furthermore, we show that WSI classification performance can be improved when the downstream transformer architecture has been pre-trained on a large corpus of text data, and only fine-tuning less than 0.1\% of its parameters. We demonstrate the effectiveness of our method in lymph node metastases classification and cancer subtype classification tasks, without the need of designing a WSI-specific transformer nor doing in-domain pre-training, keeping a reduced compute budget and low number of trainable parameters.

Related papers

Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes. In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z)
RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification [10.365234803533982]
We propose a retentive MIL method called RetMIL, which processes WSI sequences through hierarchical feature propagation structure. At the local level, the WSI sequence is divided into multiple subsequences. Tokens of each subsequence are updated through a parallel linear retention mechanism. At the global level, subsequences are fused into a global sequence, then updated through a serial retention mechanism, and finally the slide-level representation is obtained through a global attention pooling.
arXiv Detail & Related papers (2024-03-16T08:50:47Z)
What a Whole Slide Image Can Tell? Subtype-guided Masked Transformer for Pathological Image Captioning [6.496515352848627]
We propose a new paradigm Subtype-guided Masked Transformer (SGMT) for pathological captioning based on Transformers. An accompanying subtype prediction is introduced into SGMT to guide the training process and enhance the captioning accuracy. Experiments on the PatchGastricADC22 dataset demonstrate that our approach effectively adapts to the task with a transformer-based model.
arXiv Detail & Related papers (2023-10-31T16:43:03Z)
Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions. We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z)
Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification [10.243293283318415]
Multiple Instance Learning (MIL) has shown promising results in digital Pathology Whole Slide Image (WSI) classification. We propose an efficient WSI fine-tuning framework motivated by the Information Bottleneck theory. Our framework is evaluated on five pathology WSI datasets on various WSI heads.
arXiv Detail & Related papers (2023-03-15T08:41:57Z)
Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes. Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
Kernel Attention Transformer (KAT) for Histopathology Whole Slide Image Classification [15.49319477737895]
We propose a kernel attention Transformer (KAT) for histopathology WSI classification. The proposed KAT can better describe the hierarchical context information of the local regions of the WSI. The experimental results have demonstrated the proposed KAT is effective and efficient in the task of histopathology WSI classification.
arXiv Detail & Related papers (2022-06-27T10:00:12Z)
Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction [138.04956118993934]
We propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST) CST embedding HSI sparsity into deep learning for HSI reconstruction. In particular, CST uses our proposed spectra-aware screening mechanism (SASM) for coarse patch selecting. Then the selected patches are fed into our customized spectra-aggregation hashing multi-head self-attention (SAH-MSA) for fine pixel clustering and self-similarity capturing.
arXiv Detail & Related papers (2022-03-09T16:17:47Z)
Automatic size and pose homogenization with spatial transformer network to improve and accelerate pediatric segmentation [51.916106055115755]
We propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN) Our architecture is composed of three sequential modules that are estimated together during training. We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners.
arXiv Detail & Related papers (2021-07-06T14:50:03Z)
An Efficient Cervical Whole Slide Image Analysis Framework Based on Multi-scale Semantic and Spatial Features using Deep Learning [2.7218168309244652]
This study designs a novel inline connection network (InCNet) by enriching the multi-scale connectivity to build the lightweight model named You Only Look Cytopathology Once (YOLCO) The proposed model allows the input size enlarged to megapixel that can stitch the WSI without any overlap by the average repeats. Based on Transformer for classifying the integrated multi-scale multi-task features, the experimental results appear $0.872$ AUC score better and $2.51times$ faster than the best conventional method in WSI classification.
arXiv Detail & Related papers (2021-06-29T06:24:55Z)
Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z)
Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex. This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.