Attention Is Not What You Need: Revisiting Multi-Instance Learning for Whole Slide Image Classification
- URL: http://arxiv.org/abs/2408.09449v1
- Date: Sun, 18 Aug 2024 12:15:22 GMT
- Title: Attention Is Not What You Need: Revisiting Multi-Instance Learning for Whole Slide Image Classification
- Authors: Xin Liu, Weijia Zhang, Min-Ling Zhang,
- Abstract summary: We argue that synergizing the standard MIL assumption with variational inference encourages the model to focus on tumour morphology instead of spurious correlations.
Our method also achieves better classification boundaries for identifying hard instances and mitigates the effect of spurious correlations between bags and labels.
- Score: 51.95824566163554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although attention-based multi-instance learning algorithms have achieved impressive performances on slide-level whole slide image (WSI) classification tasks, they are prone to mistakenly focus on irrelevant patterns such as staining conditions and tissue morphology, leading to incorrect patch-level predictions and unreliable interpretability. Moreover, these attention-based MIL algorithms tend to focus on salient instances and struggle to recognize hard-to-classify instances. In this paper, we first demonstrate that attention-based WSI classification methods do not adhere to the standard MIL assumptions. From the standard MIL assumptions, we propose a surprisingly simple yet effective instance-based MIL method for WSI classification (FocusMIL) based on max-pooling and forward amortized variational inference. We argue that synergizing the standard MIL assumption with variational inference encourages the model to focus on tumour morphology instead of spurious correlations. Our experimental evaluations show that FocusMIL significantly outperforms the baselines in patch-level classification tasks on the Camelyon16 and TCGA-NSCLC benchmarks. Visualization results show that our method also achieves better classification boundaries for identifying hard instances and mitigates the effect of spurious correlations between bags and labels.
Related papers
- Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective [5.09611816929943]
Accurately predicting downstream task performance prior to model training is crucial for efficient resource allocation.
Existing performance prediction methods suffer from limited accuracy and reliability.
We propose a Clustering-On-Difficulty (COD) downstream performance prediction framework.
arXiv Detail & Related papers (2025-02-24T15:44:57Z) - Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more computation-efficient metric for performance estimation.
We present FLP-M, a fundamental approach for performance prediction that addresses the practical need to integrate datasets from multiple sources during pre-training.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - Multiple Instance Verification [11.027466339522777]
We show that naive adaptations of attention-based multiple instance learning methods and standard verification methods are unsuitable for this setting.
Under the CAP framework, we propose two novel attention functions to address the challenge of distinguishing between highly similar instances in a target bag.
arXiv Detail & Related papers (2024-07-09T04:51:22Z) - Rethinking Attention-Based Multiple Instance Learning for Whole-Slide Pathological Image Classification: An Instance Attribute Viewpoint [11.09441191807822]
Multiple instance learning (MIL) is a robust paradigm for whole-slide pathological image (WSI) analysis.
This paper proposes an Attribute-Driven MIL (AttriMIL) framework to address these issues.
arXiv Detail & Related papers (2024-03-30T13:04:46Z) - MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis.
We represent each WSI as an undirected graph.
To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z) - Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole Slide Image Classification [6.705260410604528]
In computational pathology, whole-slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations.
Multiple-instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains challenging.
We propose a new approach inspired by cooperative game theory: employing Shapley values to assess each instance's contribution, thereby improving IIS estimation.
arXiv Detail & Related papers (2023-12-09T07:35:09Z) - Slot-Mixup with Subsampling: A Simple Regularization for WSI
Classification [13.286360560353936]
Whole slide image (WSI) classification requires repetitive zoom-in and out for pathologists, as only small portions of the slide may be relevant to detecting cancer.
Due to the lack of patch-level labels, multiple instance learning (MIL) is a common practice for training a WSI classifier.
One of the challenges in MIL for WSIs is the weak supervision coming only from the slide-level labels, often resulting in severe overfitting.
Our approach augments the training dataset by sampling a subset of patches in the WSI without significantly altering the underlying semantics of the original slides.
arXiv Detail & Related papers (2023-11-29T09:18:39Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole
Slide Image Classification [18.679580844360615]
We propose a new Pseudo-bag Mixup (PseMix) data augmentation scheme to improve the training of MIL models.
Our scheme generalizes the Mixup strategy for general images to special WSIs via pseudo-bags.
It is designed as an efficient and decoupled method, neither involving time-consuming operations nor relying on MIL model predictions.
arXiv Detail & Related papers (2023-06-28T13:02:30Z) - Mitigating Spurious Correlations in Multi-modal Models during
Fine-tuning [18.45898471459533]
Spurious correlations that degrade model generalization or lead the model to be right for the wrong reasons are one of the main robustness concerns for real-world deployments.
This paper proposes a novel approach to address spurious correlations during fine-tuning for a given domain of interest.
arXiv Detail & Related papers (2023-04-08T05:20:33Z) - Active Learning Enhances Classification of Histopathology Whole Slide
Images with Attention-based Multiple Instance Learning [48.02011627390706]
We train an attention-based MIL and calculate a confidence metric for every image in the dataset to select the most uncertain WSIs for expert annotation.
With a novel attention guiding loss, this leads to an accuracy boost of the trained models with few regions annotated for each class.
It may in the future serve as an important contribution to train MIL models in the clinically relevant context of cancer classification in histopathology.
arXiv Detail & Related papers (2023-03-02T15:18:58Z) - Interventional Multi-Instance Learning with Deconfounded Instance-Level
Prediction [29.151629044965983]
We propose a novel interventional multi-instance learning (IMIL) framework to achieve deconfounded instance-level prediction.
Unlike traditional likelihood-based strategies, we design an Expectation-Maximization (EM) algorithm based on causal intervention.
Our IMIL method substantially reduces false positives and outperforms state-of-the-art MIL methods.
arXiv Detail & Related papers (2022-04-20T03:17:36Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Deep Clustering by Semantic Contrastive Learning [67.28140787010447]
We introduce a novel variant called Semantic Contrastive Learning (SCL)
It explores the characteristics of both conventional contrastive learning and deep clustering.
It can amplify the strengths of contrastive learning and deep clustering in a unified approach.
arXiv Detail & Related papers (2021-03-03T20:20:48Z) - Supervised PCA: A Multiobjective Approach [70.99924195791532]
Methods for supervised principal component analysis (SPCA)
We propose a new method for SPCA that addresses both of these objectives jointly.
Our approach accommodates arbitrary supervised learning losses and, through a statistical reformulation, provides a novel low-rank extension of generalized linear models.
arXiv Detail & Related papers (2020-11-10T18:46:58Z) - Beyond Marginal Uncertainty: How Accurately can Bayesian Regression
Models Estimate Posterior Predictive Correlations? [13.127549105535623]
It is often more useful to estimate predictive correlations between the function values at different input locations.
We first consider a downstream task which depends on posterior predictive correlations: transductive active learning (TAL)
Since TAL is too expensive and indirect to guide development of algorithms, we introduce two metrics which more directly evaluate the predictive correlations.
arXiv Detail & Related papers (2020-11-06T03:48:59Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.