MamMIL: Multiple Instance Learning for Whole Slide Images with State
Space Models
- URL: http://arxiv.org/abs/2403.05160v1
- Date: Fri, 8 Mar 2024 09:02:13 GMT
- Title: MamMIL: Multiple Instance Learning for Whole Slide Images with State
Space Models
- Authors: Zijie Fang, Yifeng Wang, Zhi Wang, Jian Zhang, Xiangyang Ji, Yongbing
Zhang
- Abstract summary: pathological diagnosis, the gold standard for cancer diagnosis, has achieved superior performance by combining the Transformer with the multiple instance learning (MIL) framework using whole slide images (WSIs)
We propose a MamMIL framework for WSI classification by cooperating the selective structured state space model (i.e., Mamba) with MIL for the first time.
Specifically, to solve the problem that Mamba can only conduct unidirectional one-dimensional (1D) sequence modeling, we innovatively introduce a bidirectional state space model and a 2D context-aware block.
- Score: 58.39336492765728
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, pathological diagnosis, the gold standard for cancer diagnosis, has
achieved superior performance by combining the Transformer with the multiple
instance learning (MIL) framework using whole slide images (WSIs). However, the
giga-pixel nature of WSIs poses a great challenge for the quadratic-complexity
self-attention mechanism in Transformer to be applied in MIL. Existing studies
usually use linear attention to improve computing efficiency but inevitably
bring performance bottlenecks. To tackle this challenge, we propose a MamMIL
framework for WSI classification by cooperating the selective structured state
space model (i.e., Mamba) with MIL for the first time, enabling the modeling of
instance dependencies while maintaining linear complexity. Specifically, to
solve the problem that Mamba can only conduct unidirectional one-dimensional
(1D) sequence modeling, we innovatively introduce a bidirectional state space
model and a 2D context-aware block to enable MamMIL to learn the bidirectional
instance dependencies with 2D spatial relationships. Experiments on two
datasets show that MamMIL can achieve advanced classification performance with
smaller memory footprints than the state-of-the-art MIL frameworks based on the
Transformer. The code will be open-sourced if accepted.
Related papers
- Hypergraph Mamba for Efficient Whole Slide Image Understanding [10.285000840656808]
Whole Slide Images (WSIs) in histo pose a significant challenge for medical image analysis due to their ultra-high resolution, massive scale, and intricate spatial relationships.<n>We introduce the WSI-HGMamba, a novel framework that unifies the high-order relational modeling capabilities of the Hypergraph Neural Networks (HGNNs) with the linear-time sequential modeling efficiency of the State Space Models.
arXiv Detail & Related papers (2025-05-23T04:33:54Z) - DAMamba: Vision State Space Model with Dynamic Adaptive Scan [51.81060691414399]
State space models (SSMs) have recently garnered significant attention in computer vision.
We propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions.
Based on DAS, we propose the vision backbone DAMamba, which significantly outperforms current state-of-the-art vision Mamba models in vision tasks.
arXiv Detail & Related papers (2025-02-18T08:12:47Z) - The Role of Graph-based MIL and Interventional Training in the Generalization of WSI Classifiers [8.867734798489037]
Whole Slide Imaging (WSI), which involves high-resolution digital scans of pathology slides, has become the gold standard for cancer diagnosis.
Its gigapixel resolution and the scarcity of annotated datasets present challenges for deep learning models.
We introduce a new framework, Graph-based Multiple Instance Learning with Interventional Training (GMIL-IT) for WSI classification.
arXiv Detail & Related papers (2025-01-31T11:21:08Z) - SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification [9.69491390062406]
We propose a novel MIL framework, named SAM-MIL, that emphasizes spatial contextual awareness and explicitly incorporates spatial context.
Our approach includes the design of group feature extraction based on spatial context and a SAM-Guided Group Masking strategy.
Experimental results on the CAMELYON-16 and TCGA Lung Cancer datasets demonstrate that our proposed SAM-MIL model outperforms existing mainstream methods in WSIs classification.
arXiv Detail & Related papers (2024-07-25T01:12:48Z) - GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity.
However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks.
Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z) - Combining Graph Neural Network and Mamba to Capture Local and Global Tissue Spatial Relationships in Whole Slide Images [1.1813933389519358]
In computational pathology, extracting spatial features from gigapixel whole slide images (WSIs) is a fundamental task.
We introduce a model that combines a message-passing graph neural network (GNN) with a state space model (Mamba) to capture both local and global spatial relationships.
The model's effectiveness was demonstrated in predicting progression-free survival among patients with early-stage lung adenocarcinomas.
arXiv Detail & Related papers (2024-06-05T22:06:57Z) - Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint [11.09441191807822]
Multiple instance learning (MIL) is a robust paradigm for whole-slide pathological image (WSI) analysis.
This paper proposes an Attribute-Driven MIL (AttriMIL) framework to address these issues.
arXiv Detail & Related papers (2024-03-30T13:04:46Z) - MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in
Computational Pathology [10.933433327636918]
Multiple Instance Learning (MIL) has emerged as a dominant paradigm to extract discriminative feature representations within Whole Slide Images (WSIs) in computational pathology.
In this paper, we incorporate the Selective Scan Space State Sequential Model (Mamba) in Multiple Instance Learning (MIL) for long sequence modeling with linear complexity.
Our proposed framework performs favorably against state-of-the-art MIL methods.
arXiv Detail & Related papers (2024-03-11T15:17:25Z) - Histopathology Whole Slide Image Analysis with Heterogeneous Graph
Representation Learning [78.49090351193269]
We propose a novel graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis.
Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic attribute similarity to each edge.
Our framework outperforms the state-of-the-art methods with considerable margins on various tasks.
arXiv Detail & Related papers (2023-07-09T14:43:40Z) - Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks.
Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs.
We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z) - Task-specific Fine-tuning via Variational Information Bottleneck for
Weakly-supervised Pathology Whole Slide Image Classification [10.243293283318415]
Multiple Instance Learning (MIL) has shown promising results in digital Pathology Whole Slide Image (WSI) classification.
We propose an efficient WSI fine-tuning framework motivated by the Information Bottleneck theory.
Our framework is evaluated on five pathology WSI datasets on various WSI heads.
arXiv Detail & Related papers (2023-03-15T08:41:57Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.