PreMix: Addressing Label Scarcity in Whole Slide Image Classification with Pre-trained Multiple Instance Learning Aggregators
- URL: http://arxiv.org/abs/2408.01162v2
- Date: Fri, 24 Jan 2025 04:42:06 GMT
- Title: PreMix: Addressing Label Scarcity in Whole Slide Image Classification with Pre-trained Multiple Instance Learning Aggregators
- Authors: Bryan Wong, Mun Yong Yi,
- Abstract summary: Multiple instance learning (MIL) has emerged as a powerful framework for weakly supervised whole slide image (WSI) classification.
We propose PreMix, a novel framework that leverages a non-contrastive pre-training method, Barlow Twins.
We show that PreMix achieves an average F1 improvement of 4.7% over the baseline HIPT across various WSI training datasets and label sizes.
- Score: 2.6703221234079946
- License:
- Abstract: Multiple instance learning (MIL) has emerged as a powerful framework for weakly supervised whole slide image (WSI) classification, enabling slide-level predictions without requiring detailed patch-level annotations. However, a key limitation of MIL lies in the underexplored potential of pre-training the MIL aggregator. Most existing approaches train it from scratch, resulting in performance heavily dependent on the number of labeled WSIs, while overlooking the abundance of unlabeled WSIs available in real-world scenarios. To address this, we propose PreMix, a novel framework that leverages a non-contrastive pre-training method, Barlow Twins, augmented with the Slide Mixing approach to generate additional positive pairs and enhance feature learning, particularly under limited labeled WSI conditions. Fine-tuning with Mixup and Manifold Mixup further enhances robustness by effectively handling the diverse sizes of gigapixel WSIs. Experimental results demonstrate that integrating HIPT into PreMix achieves an average F1 improvement of 4.7% over the baseline HIPT across various WSI training datasets and label sizes. These findings underscore its potential to advance WSI classification with limited labeled data and its applicability to real-world histopathology practices. The code is available at https://anonymous.4open.science/r/PreMix
Related papers
- MergeUp-augmented Semi-Weakly Supervised Learning for WSI Classification [1.2387547097768696]
Multiple instance learning (MIL) is a promising weakly supervised learning approach for WSI classification.
We introduce a feature augmentation technique, MergeUp, which merges bags with low-priority bags to enhance inter-category information.
Experimental results on the CAMELYON-16, BRACS, and TCGA-LUNG datasets demonstrate the superiority of our method over existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-08-23T04:08:30Z) - Rethinking Pre-Trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification [2.6703221234079946]
Multiple instance learning (MIL) has become a preferred method for gigapixel whole slide image (WSI) classification without requiring patch-level annotations.
This study systematically evaluating MIL feature extractors across three dimensions: pre-training dataset, backbone model, and pre-training method.
arXiv Detail & Related papers (2024-08-02T10:34:23Z) - Task-customized Masked AutoEncoder via Mixture of Cluster-conditional
Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training.
We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE)
MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z) - Slot-Mixup with Subsampling: A Simple Regularization for WSI
Classification [13.286360560353936]
Whole slide image (WSI) classification requires repetitive zoom-in and out for pathologists, as only small portions of the slide may be relevant to detecting cancer.
Due to the lack of patch-level labels, multiple instance learning (MIL) is a common practice for training a WSI classifier.
One of the challenges in MIL for WSIs is the weak supervision coming only from the slide-level labels, often resulting in severe overfitting.
Our approach augments the training dataset by sampling a subset of patches in the WSI without significantly altering the underlying semantics of the original slides.
arXiv Detail & Related papers (2023-11-29T09:18:39Z) - Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole
Slide Image Classification [18.679580844360615]
We propose a new Pseudo-bag Mixup (PseMix) data augmentation scheme to improve the training of MIL models.
Our scheme generalizes the Mixup strategy for general images to special WSIs via pseudo-bags.
It is designed as an efficient and decoupled method, neither involving time-consuming operations nor relying on MIL model predictions.
arXiv Detail & Related papers (2023-06-28T13:02:30Z) - Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot
Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification.
We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - ReMix: A General and Efficient Framework for Multiple Instance Learning
based Whole Slide Image Classification [14.78430890440035]
Whole slide image (WSI) classification often relies on weakly supervised multiple instance learning (MIL) methods to handle gigapixel resolution images and slide-level labels.
We propose ReMix, a general and efficient framework for MIL based WSI classification.
arXiv Detail & Related papers (2022-07-05T04:21:35Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained
Data [124.95585891086894]
Proposal is called Semantically Proportional Mixing (SnapMix)
It exploits class activation map (CAM) to lessen the label noise in augmenting fine-grained data.
Our method consistently outperforms existing mixed-based approaches.
arXiv Detail & Related papers (2020-12-09T03:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.