MamMIL: Multiple Instance Learning for Whole Slide Images with State
Space Models
- URL: http://arxiv.org/abs/2403.05160v1
- Date: Fri, 8 Mar 2024 09:02:13 GMT
- Title: MamMIL: Multiple Instance Learning for Whole Slide Images with State
Space Models
- Authors: Zijie Fang, Yifeng Wang, Zhi Wang, Jian Zhang, Xiangyang Ji, Yongbing
Zhang
- Abstract summary: pathological diagnosis, the gold standard for cancer diagnosis, has achieved superior performance by combining the Transformer with the multiple instance learning (MIL) framework using whole slide images (WSIs)
We propose a MamMIL framework for WSI classification by cooperating the selective structured state space model (i.e., Mamba) with MIL for the first time.
Specifically, to solve the problem that Mamba can only conduct unidirectional one-dimensional (1D) sequence modeling, we innovatively introduce a bidirectional state space model and a 2D context-aware block.
- Score: 58.39336492765728
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, pathological diagnosis, the gold standard for cancer diagnosis, has
achieved superior performance by combining the Transformer with the multiple
instance learning (MIL) framework using whole slide images (WSIs). However, the
giga-pixel nature of WSIs poses a great challenge for the quadratic-complexity
self-attention mechanism in Transformer to be applied in MIL. Existing studies
usually use linear attention to improve computing efficiency but inevitably
bring performance bottlenecks. To tackle this challenge, we propose a MamMIL
framework for WSI classification by cooperating the selective structured state
space model (i.e., Mamba) with MIL for the first time, enabling the modeling of
instance dependencies while maintaining linear complexity. Specifically, to
solve the problem that Mamba can only conduct unidirectional one-dimensional
(1D) sequence modeling, we innovatively introduce a bidirectional state space
model and a 2D context-aware block to enable MamMIL to learn the bidirectional
instance dependencies with 2D spatial relationships. Experiments on two
datasets show that MamMIL can achieve advanced classification performance with
smaller memory footprints than the state-of-the-art MIL frameworks based on the
Transformer. The code will be open-sourced if accepted.
Related papers
- GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity.
However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks.
Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z) - Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification [4.389334324926174]
This study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task.
MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder, and 3) A Weighted MCS Fusion (WMF) module.
Experimental results from three public HSI datasets demonstrate that our method outperforms existing baselines and state-of-the-art approaches.
arXiv Detail & Related papers (2024-05-20T13:19:02Z) - Multi-head Attention-based Deep Multiple Instance Learning [1.0389304366020162]
MAD-MIL is a Multi-head Attention-based Deep Multiple Instance Learning model.
It is designed for weakly supervised Whole Slide Images (WSIs) classification in digital pathology.
evaluated on the MNIST-BAGS and public datasets, including TUPAC16, TCGA BRCA, TCGA LUNG, and TCGA KIDNEY.
arXiv Detail & Related papers (2024-04-08T09:54:28Z) - MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in
Computational Pathology [10.933433327636918]
Multiple Instance Learning (MIL) has emerged as a dominant paradigm to extract discriminative feature representations within Whole Slide Images (WSIs) in computational pathology.
In this paper, we incorporate the Selective Scan Space State Sequential Model (Mamba) in Multiple Instance Learning (MIL) for long sequence modeling with linear complexity.
Our proposed framework performs favorably against state-of-the-art MIL methods.
arXiv Detail & Related papers (2024-03-11T15:17:25Z) - The Hidden Attention of Mamba Models [54.50526986788175]
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains.
We show that such models can be viewed as attention-driven models.
This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers.
arXiv Detail & Related papers (2024-03-03T18:58:21Z) - MatFormer: Nested Transformer for Elastic Inference [94.1789252941718]
MatFormer is a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints.
We show that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B.
We also observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
arXiv Detail & Related papers (2023-10-11T17:57:14Z) - STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action
Recognition [66.96931254510544]
We study the problem of human action recognition using motion capture (MoCap) sequences.
We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences.
The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning
for Histopathology Whole Slide Image Classification [18.11776334311096]
Multiple instance learning (MIL) has been increasingly used in the classification of histopathology whole slide images (WSIs)
We propose to virtually enlarge the number of bags by introducing the concept of pseudo-bags.
We also contribute to deriving the instance probability under the framework of attention-based MIL, and utilize the derivation to help construct and analyze the proposed framework.
arXiv Detail & Related papers (2022-03-22T22:33:42Z) - Dual-stream Multiple Instance Learning Network for Whole Slide Image
Classification with Self-supervised Contrastive Learning [16.84711797934138]
We address the challenging problem of whole slide image (WSI) classification.
WSI classification can be cast as a multiple instance learning (MIL) problem when only slide-level labels are available.
We propose a MIL-based method for WSI classification and tumor detection that does not require localized annotations.
arXiv Detail & Related papers (2020-11-17T20:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.