Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning
- URL: http://arxiv.org/abs/2510.25759v1
- Date: Wed, 29 Oct 2025 17:55:17 GMT
- Title: Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning
- Authors: Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes,
- Abstract summary: We design a synthetic classification task where accounting for adjacent instance features is crucial for accurate prediction.<n>We empirically show that newer correlated MIL methods still struggle to generalize as well as possible when trained from scratch on tens of thousands of instances.
- Score: 4.299364919356825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiple instance learning (MIL) is often used in medical imaging to classify high-resolution 2D images by processing patches or classify 3D volumes by processing slices. However, conventional MIL approaches treat instances separately, ignoring contextual relationships such as the appearance of nearby patches or slices that can be essential in real applications. We design a synthetic classification task where accounting for adjacent instance features is crucial for accurate prediction. We demonstrate the limitations of off-the-shelf MIL approaches by quantifying their performance compared to the optimal Bayes estimator for this task, which is available in closed-form. We empirically show that newer correlated MIL methods still struggle to generalize as well as possible when trained from scratch on tens of thousands of instances.
Related papers
- Denoising Mutual Knowledge Distillation in Bi-Directional Multiple Instance Learning [4.435658228432678]
Multiple Instance Learning is the predominant method for Whole Slide Image classification in digital pathology.<n>We propose to bridge the gap between MIL and fully supervised learning by augmenting both the bag- and instance-level learning processes.<n>The proposed algorithm improves the performance of dual-level MIL algorithms on both bag- and instance-level predictions.
arXiv Detail & Related papers (2025-05-17T16:26:43Z) - A Spatially-Aware Multiple Instance Learning Framework for Digital Pathology [4.012490059423154]
Multiple instance learning (MIL) is a promising approach for weakly supervised classification in pathology using whole slide images.<n>Recent advancements, such as Transformer based MIL (TransMIL), have incorporated spatial context and inter-patch relationships.<n>In this work, we enhance the ABMIL framework by integrating interaction-aware representations to address this question.
arXiv Detail & Related papers (2025-04-24T08:53:46Z) - Position: From Correlation to Causation: Max-Pooling-Based Multi-Instance Learning Leads to More Robust Whole Slide Image Classification [51.95824566163554]
We argue that well-trained max-pooling-based MIL models can make predictions based on causal factors and avoid relying on spurious correlations.<n>We propose a simple yet effective max-pooling-based MIL method (FocusMIL) that outperforms existing mainstream attention-based methods on two datasets.
arXiv Detail & Related papers (2024-08-18T12:15:22Z) - Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint [11.09441191807822]
Multiple instance learning (MIL) is a robust paradigm for whole-slide pathological image (WSI) analysis.
This paper proposes an Attribute-Driven MIL (AttriMIL) framework to address these issues.
arXiv Detail & Related papers (2024-03-30T13:04:46Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Deep Multiple Instance Learning with Distance-Aware Self-Attention [9.361964965928063]
We introduce a novel multiple instance learning (MIL) model with distance-aware self-attention (DAS-MIL)
Unlike existing relative position representations for self-attention which are discrete, our approach introduces continuous distance-dependent terms into the computation of the attention weights.
We evaluate our model on a custom MNIST-based MIL dataset and on CAMELYON16, a publicly available cancer metastasis detection dataset.
arXiv Detail & Related papers (2023-05-17T20:11:43Z) - DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning
for Histopathology Whole Slide Image Classification [18.11776334311096]
Multiple instance learning (MIL) has been increasingly used in the classification of histopathology whole slide images (WSIs)
We propose to virtually enlarge the number of bags by introducing the concept of pseudo-bags.
We also contribute to deriving the instance probability under the framework of attention-based MIL, and utilize the derivation to help construct and analyze the proposed framework.
arXiv Detail & Related papers (2022-03-22T22:33:42Z) - Label Cleaning Multiple Instance Learning: Refining Coarse Annotations
on Single Whole-Slide Images [83.7047542725469]
Annotating cancerous regions in whole-slide images (WSIs) of pathology samples plays a critical role in clinical diagnosis, biomedical research, and machine learning algorithms development.
We present a method, named Label Cleaning Multiple Instance Learning (LC-MIL), to refine coarse annotations on a single WSI without the need of external training data.
Our experiments on a heterogeneous WSI set with breast cancer lymph node metastasis, liver cancer, and colorectal cancer samples show that LC-MIL significantly refines the coarse annotations, outperforming the state-of-the-art alternatives, even while learning from a single slide.
arXiv Detail & Related papers (2021-09-22T15:06:06Z) - CIL: Contrastive Instance Learning Framework for Distantly Supervised
Relation Extraction [52.94486705393062]
We go beyond typical multi-instance learning (MIL) framework and propose a novel contrastive instance learning (CIL) framework.
Specifically, we regard the initial MIL as the relational triple encoder and constraint positive pairs against negative pairs for each instance.
Experiments demonstrate the effectiveness of our proposed framework, with significant improvements over the previous methods on NYT10, GDS and KBP.
arXiv Detail & Related papers (2021-06-21T04:51:59Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - Memory-Augmented Relation Network for Few-Shot Learning [114.47866281436829]
In this work, we investigate a new metric-learning method, Memory-Augmented Relation Network (MRN)
In MRN, we choose the samples that are visually similar from the working context, and perform weighted information propagation to attentively aggregate helpful information from chosen ones to enhance its representation.
We empirically demonstrate that MRN yields significant improvement over its ancestor and achieves competitive or even better performance when compared with other few-shot learning approaches.
arXiv Detail & Related papers (2020-05-09T10:09:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.