Hypergraph Mamba for Efficient Whole Slide Image Understanding
- URL: http://arxiv.org/abs/2505.17457v2
- Date: Mon, 04 Aug 2025 08:35:27 GMT
- Title: Hypergraph Mamba for Efficient Whole Slide Image Understanding
- Authors: Jiaxuan Lu, Yuhui Lin, Junyan Shi, Fang Yan, Dongzhan Zhou, Yue Gao, Xiaosong Wang,
- Abstract summary: Whole Slide Images (WSIs) in histo pose a significant challenge for medical image analysis due to their ultra-high resolution, massive scale, and intricate spatial relationships.<n>We introduce the WSI-HGMamba, a novel framework that unifies the high-order relational modeling capabilities of the Hypergraph Neural Networks (HGNNs) with the linear-time sequential modeling efficiency of the State Space Models.
- Score: 10.285000840656808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whole Slide Images (WSIs) in histopathology pose a significant challenge for extensive medical image analysis due to their ultra-high resolution, massive scale, and intricate spatial relationships. Although existing Multiple Instance Learning (MIL) approaches like Graph Neural Networks (GNNs) and Transformers demonstrate strong instance-level modeling capabilities, they encounter constraints regarding scalability and computational expenses. To overcome these limitations, we introduce the WSI-HGMamba, a novel framework that unifies the high-order relational modeling capabilities of the Hypergraph Neural Networks (HGNNs) with the linear-time sequential modeling efficiency of the State Space Models. At the core of our design is the HGMamba block, which integrates message passing, hypergraph scanning & flattening, and bidirectional state space modeling (Bi-SSM), enabling the model to retain both relational and contextual cues while remaining computationally efficient. Compared to Transformer and Graph Transformer counterparts, WSI-HGMamba achieves superior performance with up to 7* reduction in FLOPs. Extensive experiments on multiple public and private WSI benchmarks demonstrate that our method provides a scalable, accurate, and efficient solution for slide-level understanding, making it a promising backbone for next-generation pathology AI systems.
Related papers
- MS-UMamba: An Improved Vision Mamba Unet for Fetal Abdominal Medical Image Segmentation [1.2721397985664153]
We propose MS-UMamba, a novel hybrid convolutional-mamba model for fetal ultrasound image segmentation.<n>Specifically, we design a visual state space block integrated with a CNN branch, which leverages Mamba's global modeling strengths.<n>We also propose an efficient multi-scale feature fusion module, which integrates feature information from different layers.
arXiv Detail & Related papers (2025-06-14T10:34:10Z) - DAMamba: Vision State Space Model with Dynamic Adaptive Scan [51.81060691414399]
State space models (SSMs) have recently garnered significant attention in computer vision.<n>We propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions.<n>Based on DAS, we propose the vision backbone DAMamba, which significantly outperforms current state-of-the-art vision Mamba models in vision tasks.
arXiv Detail & Related papers (2025-02-18T08:12:47Z) - Hybrid State-Space and GRU-based Graph Tokenization Mamba for Hyperspectral Image Classification [14.250184447492208]
Hyperspectral image (HSI) classification plays a pivotal role in domains such as environmental monitoring, agriculture, and urban planning.<n>Traditional methods, including machine learning and convolutional neural networks (CNNs), often struggle to effectively capture these intricate spectral-spatial features.<n>This work proposes GraphMamba, a hybrid model that combines spectral-spatial token generation, graph-based token prioritization, and cross-attention mechanisms.
arXiv Detail & Related papers (2025-02-10T13:02:19Z) - A Comparative Study on Dynamic Graph Embedding based on Mamba and Transformers [0.29687381456164]
This study presents a comparative analysis of dynamic graph embedding approaches using transformers and the recently proposed Mamba architecture.<n>We introduce three novel models: TransformerG2G augment with graph convolutional networks, mathcalDG-Mamba, and mathcalGDG-Mamba with graph isomorphism network edge convolutions.<n>Our experiments on multiple benchmark datasets demonstrate that Mamba-based models achieve comparable or superior performance to transformer-based approaches in link prediction tasks.
arXiv Detail & Related papers (2024-12-15T19:56:56Z) - SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding [66.74446220401296]
We propose SynerGen-VL, a simple yet powerful encoder-free MLLM capable of both image understanding and generation.<n>We introduce the token folding mechanism and the vision-expert-based progressive alignment pretraining strategy, which effectively support high-resolution image understanding.<n>Our code and models shall be released.
arXiv Detail & Related papers (2024-12-12T18:59:26Z) - HRVMamba: High-Resolution Visual State Space Model for Dense Prediction [60.80423207808076]
State Space Models (SSMs) with efficient hardware-aware designs have demonstrated significant potential in computer vision tasks.
These models have been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation.
We introduce the Dynamic Visual State Space (DVSS) block, which employs deformable convolution to mitigate the long-range forgetting problem.
We also introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba [54.85262314960038]
Local Attentional Mamba blocks capture both global contexts and local details with linear complexity.
Our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution.
Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62% GFLOPs.
arXiv Detail & Related papers (2024-08-05T16:39:39Z) - Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.<n>Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z) - Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification [4.389334324926174]
This study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task.
MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder, and 3) A Weighted MCS Fusion (WMF) module.
Experimental results from three public HSI datasets demonstrate that our method outperforms existing baselines and state-of-the-art approaches.
arXiv Detail & Related papers (2024-05-20T13:19:02Z) - EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba [19.062950348441426]
This work proposes to explore the potential of visual state space models in light-weight model design and introduce a novel efficient model variant dubbed EfficientVMamba.
Our EfficientVMamba integrates a atrous-based selective scan approach by efficient skip sampling, constituting building blocks designed to harness both global and local representational features.
Experimental results show that, EfficientVMamba scales down the computational complexity while yields competitive results across a variety of vision tasks.
arXiv Detail & Related papers (2024-03-15T02:48:47Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - GRASP: GRAph-Structured Pyramidal Whole Slide Image Representation [4.477527192030786]
We present GRASP, a graph-structured multi-magnification framework for processing whole slide images (WSIs) in digital pathology.<n>Our approach is designed to emulate the pathologist's behavior in handling WSIs and benefits from the hierarchical structure of WSIs.<n>GRASP, which introduces a convergence-based node aggregation mechanism, outperforms state-of-the-art methods by a high margin in terms of balanced accuracy.
arXiv Detail & Related papers (2024-02-06T00:03:44Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.