Fourier Transform Multiple Instance Learning for Whole Slide Image Classification
- URL: http://arxiv.org/abs/2510.15138v2
- Date: Tue, 21 Oct 2025 17:57:08 GMT
- Title: Fourier Transform Multiple Instance Learning for Whole Slide Image Classification
- Authors: Anthony Bilic, Guangyu Sun, Ming Li, Md Sanzid Bin Hossain, Yu Tian, Wei Zhang, Laura Brattain, Dexter Hadley, Chen Chen,
- Abstract summary: Whole Slide Image (WSI) classification relies on Multiple Instance Learning (MIL) with spatial patch features.<n>We propose a framework that augments MIL with a frequency-domain branch to provide compact global context.<n>FFT-MIL was evaluated across six state-of-the-art MIL methods on three public datasets.
- Score: 13.494732719425159
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Whole Slide Image (WSI) classification relies on Multiple Instance Learning (MIL) with spatial patch features, yet existing methods struggle to capture global dependencies due to the immense size of WSIs and the local nature of patch embeddings. This limitation hinders the modeling of coarse structures essential for robust diagnostic prediction. We propose Fourier Transform Multiple Instance Learning (FFT-MIL), a framework that augments MIL with a frequency-domain branch to provide compact global context. Low-frequency crops are extracted from WSIs via the Fast Fourier Transform and processed through a modular FFT-Block composed of convolutional layers and Min-Max normalization to mitigate the high variance of frequency data. The learned global frequency feature is fused with spatial patch features through lightweight integration strategies, enabling compatibility with diverse MIL architectures. FFT-MIL was evaluated across six state-of-the-art MIL methods on three public datasets (BRACS, LUAD, and IMP). Integration of the FFT-Block improved macro F1 scores by an average of 3.51% and AUC by 1.51%, demonstrating consistent gains across architectures and datasets. These results establish frequency-domain learning as an effective and efficient mechanism for capturing global dependencies in WSI classification, complementing spatial features and advancing the scalability and accuracy of MIL-based computational pathology.
Related papers
- WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification [2.760935655675299]
Window scale decay MIL (WSD-MIL) is designed to enhance the capacity to model tumor regions of varying scales.<n>WSD-MIL achieves state-of-the-art performance on the CAMELYON16 and TCGA-BRCA datasets while reducing 62% of the computational memory.
arXiv Detail & Related papers (2025-12-23T02:10:24Z) - CAPRMIL: Context-Aware Patch Representations for Multiple Instance Learning [7.966733148243115]
CAPRMIL produces rich context-aware patch embeddings that promote effective correlation learning on downstream tasks.<n>Our results indicate that learning rich, context-aware instance representations before aggregation is an effective and scalable alternative to complex pooling for whole-slide analysis.
arXiv Detail & Related papers (2025-12-16T16:16:45Z) - FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z) - FractMorph: A Fractional Fourier-Based Multi-Domain Transformer for Deformable Image Registration [0.6683923149620578]
We present FractMorph, a novel 3D dual-parallel transformer-based architecture that enhances cross-image feature matching.<n>A lightweight U-Net style network then predicts a dense deformation field from the transformer-enriched features.<n>Results show FractMorph achieves state-of-the-art performance with an overall Dice Similarity Coefficient (DSC) of $86.45%$, an average per-structure of $75.15%$, and a 95th-percentile Hausdorff distance (HD95) of $1.54mathrmmm$ on our data split.
arXiv Detail & Related papers (2025-08-17T17:42:10Z) - MsaMIL-Net: An End-to-End Multi-Scale Aware Multiple Instance Learning Network for Efficient Whole Slide Image Classification [0.7510165488300369]
Bag-based Multiple Instance Learning (MIL) approaches have emerged as the mainstream methodology for Whole Slide Image (WSI) classification.<n>This paper proposes an end-to-end multi-scale WSI classification framework that integrates multi-scale feature extraction with multiple instance learning.
arXiv Detail & Related papers (2025-03-11T16:16:44Z) - Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models.<n>Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z) - Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - Integrative Graph-Transformer Framework for Histopathology Whole Slide Image Representation and Classification [18.16710321320098]
In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task.
Existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles.
We introduce an integrative graph-transformer framework that simultaneously captures the context-aware relational features and global WSI representations.
arXiv Detail & Related papers (2024-03-26T22:31:05Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with
Flow-Guided Attentive Correlation and Recursive Boosting [50.17500790309477]
DeMFI-Net is a joint deblurring and multi-frame framework.
It converts blurry videos of lower-frame-rate to sharp videos at higher-frame-rate.
It achieves state-of-the-art (SOTA) performances for diverse datasets.
arXiv Detail & Related papers (2021-11-19T00:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.