CAPRMIL: Context-Aware Patch Representations for Multiple Instance Learning
- URL: http://arxiv.org/abs/2512.14540v1
- Date: Tue, 16 Dec 2025 16:16:45 GMT
- Title: CAPRMIL: Context-Aware Patch Representations for Multiple Instance Learning
- Authors: Andreas Lolos, Theofilos Christodoulou, Aris L. Moustakas, Stergios Christodoulidis, Maria Vakalopoulou,
- Abstract summary: CAPRMIL produces rich context-aware patch embeddings that promote effective correlation learning on downstream tasks.<n>Our results indicate that learning rich, context-aware instance representations before aggregation is an effective and scalable alternative to complex pooling for whole-slide analysis.
- Score: 7.966733148243115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In computational pathology, weak supervision has become the standard for deep learning due to the gigapixel scale of WSIs and the scarcity of pixel-level annotations, with Multiple Instance Learning (MIL) established as the principal framework for slide-level model training. In this paper, we introduce a novel setting for MIL methods, inspired by proceedings in Neural Partial Differential Equation (PDE) Solvers. Instead of relying on complex attention-based aggregation, we propose an efficient, aggregator-agnostic framework that removes the complexity of correlation learning from the MIL aggregator. CAPRMIL produces rich context-aware patch embeddings that promote effective correlation learning on downstream tasks. By projecting patch features -- extracted using a frozen patch encoder -- into a small set of global context/morphology-aware tokens and utilizing multi-head self-attention, CAPRMIL injects global context with linear computational complexity with respect to the bag size. Paired with a simple Mean MIL aggregator, CAPRMIL matches state-of-the-art slide-level performance across multiple public pathology benchmarks, while reducing the total number of trainable parameters by 48%-92.8% versus SOTA MILs, lowering FLOPs during inference by 52%-99%, and ranking among the best models on GPU memory efficiency and training time. Our results indicate that learning rich, context-aware instance representations before aggregation is an effective and scalable alternative to complex pooling for whole-slide analysis. Our code is available at https://github.com/mandlos/CAPRMIL
Related papers
- WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification [2.760935655675299]
Window scale decay MIL (WSD-MIL) is designed to enhance the capacity to model tumor regions of varying scales.<n>WSD-MIL achieves state-of-the-art performance on the CAMELYON16 and TCGA-BRCA datasets while reducing 62% of the computational memory.
arXiv Detail & Related papers (2025-12-23T02:10:24Z) - Reducing Variability of Multiple Instance Learning Methods for Digital Pathology [2.9284034606635267]
Digital pathology has revolutionized the field by enabling the digitization of tissue samples into whole slide images (WSIs)<n>WSIs are often divided into smaller patches with a global label.<n>MIL methods have emerged as a suitable solution for WSI classification.
arXiv Detail & Related papers (2025-06-30T22:10:24Z) - A Spatially-Aware Multiple Instance Learning Framework for Digital Pathology [4.012490059423154]
Multiple instance learning (MIL) is a promising approach for weakly supervised classification in pathology using whole slide images.<n>Recent advancements, such as Transformer based MIL (TransMIL), have incorporated spatial context and inter-patch relationships.<n>In this work, we enhance the ABMIL framework by integrating interaction-aware representations to address this question.
arXiv Detail & Related papers (2025-04-24T08:53:46Z) - CoMMIT: Coordinated Multimodal Instruction Tuning [90.1532838391285]
Multimodal large language models (MLLMs) generally involve cooperative learning between a backbone LLM and a feature encoder of non-text input modalities.<n>In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.<n>We propose a Multimodal Balance Coefficient that enables quantitative measurement of the balance of learning.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [65.04475956174959]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)<n>A significant challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.<n>This paper develops a physical layer framework for resilient SFL with large language models (LLMs) and vision language models (VLMs) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - Multi-head Attention-based Deep Multiple Instance Learning [1.0389304366020162]
MAD-MIL is a Multi-head Attention-based Deep Multiple Instance Learning model.
It is designed for weakly supervised Whole Slide Images (WSIs) classification in digital pathology.
evaluated on the MNIST-BAGS and public datasets, including TUPAC16, TCGA BRCA, TCGA LUNG, and TCGA KIDNEY.
arXiv Detail & Related papers (2024-04-08T09:54:28Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - CIL: Contrastive Instance Learning Framework for Distantly Supervised
Relation Extraction [52.94486705393062]
We go beyond typical multi-instance learning (MIL) framework and propose a novel contrastive instance learning (CIL) framework.
Specifically, we regard the initial MIL as the relational triple encoder and constraint positive pairs against negative pairs for each instance.
Experiments demonstrate the effectiveness of our proposed framework, with significant improvements over the previous methods on NYT10, GDS and KBP.
arXiv Detail & Related papers (2021-06-21T04:51:59Z) - Dual-stream Multiple Instance Learning Network for Whole Slide Image
Classification with Self-supervised Contrastive Learning [16.84711797934138]
We address the challenging problem of whole slide image (WSI) classification.
WSI classification can be cast as a multiple instance learning (MIL) problem when only slide-level labels are available.
We propose a MIL-based method for WSI classification and tumor detection that does not require localized annotations.
arXiv Detail & Related papers (2020-11-17T20:51:15Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.