WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification
- URL: http://arxiv.org/abs/2512.19982v1
- Date: Tue, 23 Dec 2025 02:10:24 GMT
- Title: WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification
- Authors: Le Feng, Li Xiao,
- Abstract summary: Window scale decay MIL (WSD-MIL) is designed to enhance the capacity to model tumor regions of varying scales.<n>WSD-MIL achieves state-of-the-art performance on the CAMELYON16 and TCGA-BRCA datasets while reducing 62% of the computational memory.
- Score: 2.760935655675299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the integration of pre-trained foundational models with multiple instance learning (MIL) has improved diagnostic accuracy in computational pathology. However, existing MIL methods focus on optimizing feature extractors and aggregation strategies while overlooking the complex semantic relationships among instances within whole slide image (WSI). Although Transformer-based MIL approaches aiming to model instance dependencies, the quadratic computational complexity limits their scalability to large-scale WSIs. Moreover, due to the pronounced variations in tumor region scales across different WSIs, existing Transformer-based methods employing fixed-scale attention mechanisms face significant challenges in precisely capturing local instance correlations and fail to account for the distance-based decay effect of patch relevance. To address these challenges, we propose window scale decay MIL (WSD-MIL), designed to enhance the capacity to model tumor regions of varying scales while improving computational efficiency. WSD-MIL comprises: 1) a window scale decay based attention module, which employs a cluster-based sampling strategy to reduce computational costs while progressively decaying attention window-scale to capture local instance relationships at varying scales; and 2) a squeeze-and-excitation based region gate module, which dynamically adjusts window weights to enhance global information modeling. Experimental results demonstrate that WSD-MIL achieves state-of-the-art performance on the CAMELYON16 and TCGA-BRCA datasets while reducing 62% of the computational memory. The code will be publicly available.
Related papers
- Fourier Transform Multiple Instance Learning for Whole Slide Image Classification [13.494732719425159]
Whole Slide Image (WSI) classification relies on Multiple Instance Learning (MIL) with spatial patch features.<n>We propose a framework that augments MIL with a frequency-domain branch to provide compact global context.<n>FFT-MIL was evaluated across six state-of-the-art MIL methods on three public datasets.
arXiv Detail & Related papers (2025-10-16T20:54:58Z) - EfficientMIL: Efficient Linear-Complexity MIL Method for WSI Classification [7.789973233645291]
We introduce EfficientMIL, a novel linear-complexity MIL approach for whole slide images (WSIs) classification with the patches selection module Adaptive Patch Selector (APS)<n> EfficientMIL achieves significant computational efficiency improvements while outperforming other MIL methods across multiple histopathology datasets.
arXiv Detail & Related papers (2025-09-28T04:47:11Z) - SemaMIL: Semantic-Aware Multiple Instance Learning with Retrieval-Guided State Space Modeling for Whole Slide Images [17.674866281320046]
SemaMIL is an adaptive method for extracting discriminative features from whole slide images.<n>It clusters semantically similar patches in sequence through a reversible permutation.<n>It achieves state-of-the-art subtype accuracy with fewer FLOPs and parameters.
arXiv Detail & Related papers (2025-08-30T10:13:18Z) - LatentLLM: Attention-Aware Joint Tensor Compression [50.33925662486034]
Large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources.<n>We propose a new framework to convert such LLMs/LMMs into a reduced-dimension latent structure.
arXiv Detail & Related papers (2025-05-23T22:39:54Z) - A Spatially-Aware Multiple Instance Learning Framework for Digital Pathology [4.012490059423154]
Multiple instance learning (MIL) is a promising approach for weakly supervised classification in pathology using whole slide images.<n>Recent advancements, such as Transformer based MIL (TransMIL), have incorporated spatial context and inter-patch relationships.<n>In this work, we enhance the ABMIL framework by integrating interaction-aware representations to address this question.
arXiv Detail & Related papers (2025-04-24T08:53:46Z) - Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation
for Reference-based Super-Resolution [48.093500219958834]
We propose an Accelerated Multi-Scale Aggregation network (AMSA) for Reference-based Super-Resolution.
The proposed AMSA achieves superior performance over state-of-the-art approaches on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2022-01-12T08:40:23Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.