Iteratively Coupled Multiple Instance Learning from Instance to Bag
Classifier for Whole Slide Image Classification
- URL: http://arxiv.org/abs/2303.15749v2
- Date: Wed, 23 Aug 2023 06:04:56 GMT
- Title: Iteratively Coupled Multiple Instance Learning from Instance to Bag
Classifier for Whole Slide Image Classification
- Authors: Hongyi Wang, Luyang Luo, Fang Wang, Ruofeng Tong, Yen-Wei Chen,
Hongjie Hu, Lanfen Lin, and Hao Chen
- Abstract summary: Whole Slide Image (WSI) classification remains a challenge due to their extremely high resolution and the absence of fine-grained labels.
We propose a novel framework called Iteratively Coupled MIL (ICMIL) which bridges the loss back-propagation process from the bag-level classifier to the patch embedder.
- Score: 21.16848269555692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whole Slide Image (WSI) classification remains a challenge due to their
extremely high resolution and the absence of fine-grained labels. Presently,
WSI classification is usually regarded as a Multiple Instance Learning (MIL)
problem when only slide-level labels are available. MIL methods involve a patch
embedding module and a bag-level classification module, but they are
prohibitively expensive to be trained in an end-to-end manner. Therefore,
existing methods usually train them separately, or directly skip the training
of the embedder. Such schemes hinder the patch embedder's access to slide-level
semantic labels, resulting in inconsistency within the entire MIL pipeline. To
overcome this issue, we propose a novel framework called Iteratively Coupled
MIL (ICMIL), which bridges the loss back-propagation process from the bag-level
classifier to the patch embedder. In ICMIL, we use category information in the
bag-level classifier to guide the patch-level fine-tuning of the patch feature
extractor. The refined embedder then generates better instance representations
for achieving a more accurate bag-level classifier. By coupling the patch
embedder and bag classifier at a low cost, our proposed framework enables
information exchange between the two modules, benefiting the entire MIL
classification model. We tested our framework on two datasets using three
different backbones, and our experimental results demonstrate consistent
performance improvements over state-of-the-art MIL methods. The code is
available at: https://github.com/Dootmaan/ICMIL.
Related papers
- Attention Is Not What You Need: Revisiting Multi-Instance Learning for Whole Slide Image Classification [51.95824566163554]
We argue that synergizing the standard MIL assumption with variational inference encourages the model to focus on tumour morphology instead of spurious correlations.
Our method also achieves better classification boundaries for identifying hard instances and mitigates the effect of spurious correlations between bags and labels.
arXiv Detail & Related papers (2024-08-18T12:15:22Z) - African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification [53.89380284760555]
textttFOCI (textbfFine-grained textbfObject textbfClasstextbfIfication) is a difficult multiple-choice benchmark for fine-grained object classification.
textttFOCIxspace complements five popular classification datasets with four domain-specific subsets from ImageNet-21k.
arXiv Detail & Related papers (2024-06-20T16:59:39Z) - Rethinking Multiple Instance Learning for Whole Slide Image
Classification: A Bag-Level Classifier is a Good Instance-Level Teacher [22.080213609228547]
Multiple Instance Learning has demonstrated promise in Whole Slide Image (WSI) classification.
Existing methods generally adopt a two-stage approach, comprising a non-learnable feature embedding stage and a classifier training stage.
We propose that a bag-level classifier can be a good instance-level teacher.
arXiv Detail & Related papers (2023-12-02T10:16:03Z) - SC-MIL: Sparsely Coded Multiple Instance Learning for Whole Slide Image Classification [2.3364474984323103]
Multiple Instance Learning (MIL) has been widely used in weakly supervised whole slide image (WSI) classification.
We propose a sparse coding MIL (SC-MIL) method that addresses the two aspects at the same time by leveraging sparse dictionary learning.
The proposed SC module can be incorporated into any existing MIL framework in a plug-and-play manner with an acceptable computational cost.
arXiv Detail & Related papers (2023-10-31T18:01:41Z) - Prediction Calibration for Generalized Few-shot Semantic Segmentation [101.69940565204816]
Generalized Few-shot Semantic (GFSS) aims to segment each image pixel into either base classes with abundant training examples or novel classes with only a handful of (e.g., 1-5) training images per class.
We build a cross-attention module that guides the classifier's final prediction using the fused multi-level features.
Our PCN outperforms the state-the-art alternatives by large margins.
arXiv Detail & Related papers (2022-10-15T13:30:12Z) - ReMix: A General and Efficient Framework for Multiple Instance Learning
based Whole Slide Image Classification [14.78430890440035]
Whole slide image (WSI) classification often relies on weakly supervised multiple instance learning (MIL) methods to handle gigapixel resolution images and slide-level labels.
We propose ReMix, a general and efficient framework for MIL based WSI classification.
arXiv Detail & Related papers (2022-07-05T04:21:35Z) - Feature Re-calibration based MIL for Whole Slide Image Classification [7.92885032436243]
Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases.
We propose to re-calibrate the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature.
We employ a position encoding module (PEM) to model spatial/morphological information, and perform pooling by multi-head self-attention (PSMA) with a Transformer encoder.
arXiv Detail & Related papers (2022-06-22T07:00:39Z) - BatchFormerV2: Exploring Sample Relationships for Dense Representation
Learning [88.82371069668147]
BatchFormerV2 is a more general batch Transformer module, which enables exploring sample relationships for dense representation learning.
BatchFormerV2 consistently improves current DETR-based detection methods by over 1.3%.
arXiv Detail & Related papers (2022-04-04T05:53:42Z) - SWAT: Spatial Structure Within and Among Tokens [53.525469741515884]
We argue that models can have significant gains when spatial structure is preserved during tokenization.
We propose two key contributions: (1) Structure-aware Tokenization and, (2) Structure-aware Mixing.
arXiv Detail & Related papers (2021-11-26T18:59:38Z) - Dual-stream Multiple Instance Learning Network for Whole Slide Image
Classification with Self-supervised Contrastive Learning [16.84711797934138]
We address the challenging problem of whole slide image (WSI) classification.
WSI classification can be cast as a multiple instance learning (MIL) problem when only slide-level labels are available.
We propose a MIL-based method for WSI classification and tumor detection that does not require localized annotations.
arXiv Detail & Related papers (2020-11-17T20:51:15Z) - Weakly-Supervised Action Localization with Expectation-Maximization
Multi-Instance Learning [82.41415008107502]
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.
It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments)
We show that our EM-MIL approach more accurately models both the learning objective and the MIL assumptions.
arXiv Detail & Related papers (2020-03-31T23:36:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.