Long-MIL: Scaling Long Contextual Multiple Instance Learning for
Histopathology Whole Slide Image Analysis
- URL: http://arxiv.org/abs/2311.12885v1
- Date: Tue, 21 Nov 2023 03:08:47 GMT
- Title: Long-MIL: Scaling Long Contextual Multiple Instance Learning for
Histopathology Whole Slide Image Analysis
- Authors: Honglin Li, Yunlong Zhang, Chenglu Zhu, Jiatong Cai, Sunyi Zheng, Lin
Yang
- Abstract summary: Whole Slide Image (WSI) of histopathology tissue is used for analysis.
Previous methods generally divide the WSI into a large number of patches, then aggregate all patches within a WSI to make the slide-level prediction.
We propose to amend position embedding for shape varying long-contextual WSI by introducing Linear Bias into Attention.
- Score: 9.912061800841267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Histopathology image analysis is the golden standard of clinical diagnosis
for Cancers. In doctors daily routine and computer-aided diagnosis, the Whole
Slide Image (WSI) of histopathology tissue is used for analysis. Because of the
extremely large scale of resolution, previous methods generally divide the WSI
into a large number of patches, then aggregate all patches within a WSI by
Multi-Instance Learning (MIL) to make the slide-level prediction when
developing computer-aided diagnosis tools. However, most previous WSI-MIL
models using global-attention without pairwise interaction and any positional
information, or self-attention with absolute position embedding can not well
handle shape varying large WSIs, e.g. testing WSIs after model deployment may
be larger than training WSIs, since the model development set is always limited
due to the difficulty of histopathology WSIs collection. To deal with the
problem, in this paper, we propose to amend position embedding for shape
varying long-contextual WSI by introducing Linear Bias into Attention, and
adapt it from 1-d long sequence into 2-d long-contextual WSI which helps model
extrapolate position embedding to unseen or under-fitted positions. We further
utilize Flash-Attention module to tackle the computational complexity of
Transformer, which also keep full self-attention performance compared to
previous attention approximation work. Our method, Long-contextual MIL
(Long-MIL) are evaluated on extensive experiments including 4 dataset including
WSI classification and survival prediction tasks to validate the superiority on
shape varying WSIs. The source code will be open-accessed soon.
Related papers
- Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis [9.090504201460817]
Histo Whole Slide Image (WSI) analysis serves as the gold standard for clinical cancer diagnosis in the daily routines of doctors.
Previous methods typically employ Multi-pathology Learning to enable slide-level prediction given only slide-level labels.
To alleviate the computational complexity of long sequences in large WSIs, methods like HIPT use region-slicing, and TransMIL employs approximation of full self-attention.
arXiv Detail & Related papers (2024-10-18T06:12:36Z) - WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering [6.315841446240698]
We propose a novel framework to interpret whole slide images (WSIs) by generative visual question answering.
WSI-VQA shows universality by reframing various kinds of slide-level tasks in a question-answering pattern.
We establish a WSI-VQA dataset which contains 8672 slide-level question-answering pairs with 977 WSIs.
arXiv Detail & Related papers (2024-07-08T04:37:32Z) - PathAlign: A vision-language model for whole slide images in histopathology [13.567674461880905]
We develop a vision-language model based on the BLIP-2 framework using WSIs and curated text from pathology reports.
This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest.
We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization.
arXiv Detail & Related papers (2024-06-27T23:43:36Z) - TSI-Bench: Benchmarking Time Series Imputation [52.27004336123575]
TSI-Bench is a comprehensive benchmark suite for time series imputation utilizing deep learning techniques.
The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms.
TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes.
arXiv Detail & Related papers (2024-06-18T16:07:33Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images.
We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z) - Active Learning Enhances Classification of Histopathology Whole Slide
Images with Attention-based Multiple Instance Learning [48.02011627390706]
We train an attention-based MIL and calculate a confidence metric for every image in the dataset to select the most uncertain WSIs for expert annotation.
With a novel attention guiding loss, this leads to an accuracy boost of the trained models with few regions annotated for each class.
It may in the future serve as an important contribution to train MIL models in the clinically relevant context of cancer classification in histopathology.
arXiv Detail & Related papers (2023-03-02T15:18:58Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.