Related papers: PTCMIL: Multiple Instance Learning via Prompt Token Clustering for Whole Slide Image Analysis

PTCMIL: Multiple Instance Learning via Prompt Token Clustering for Whole Slide Image Analysis

URL: http://arxiv.org/abs/2507.18848v1
Date: Thu, 24 Jul 2025 23:33:59 GMT
Title: PTCMIL: Multiple Instance Learning via Prompt Token Clustering for Whole Slide Image Analysis
Authors: Beidi Zhao, SangMook Kim, Hao Chen, Chen Zhou, Zu-hua Gao, Gang Wang, Xiaoxiao Li,
Abstract summary: Multiple Instance Learning (MIL) has advanced WSI analysis but struggles with the complexity and heterogeneity of WSIs.<n>We propose PTCMIL, a novel Prompt Token Clustering-based ViT for MIL aggregation.<n> PTCMIL unifies clustering and prediction tasks in an end-to-end manner.
Score: 22.06174028063076
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multiple Instance Learning (MIL) has advanced WSI analysis but struggles with the complexity and heterogeneity of WSIs. Existing MIL methods face challenges in aggregating diverse patch information into robust WSI representations. While ViTs and clustering-based approaches show promise, they are computationally intensive and fail to capture task-specific and slide-specific variability. To address these limitations, we propose PTCMIL, a novel Prompt Token Clustering-based ViT for MIL aggregation. By introducing learnable prompt tokens into the ViT backbone, PTCMIL unifies clustering and prediction tasks in an end-to-end manner. It dynamically aligns clustering with downstream tasks, using projection-based clustering tailored to each WSI, reducing complexity while preserving patch heterogeneity. Through token merging and prototype-based pooling, PTCMIL efficiently captures task-relevant patterns. Extensive experiments on eight datasets demonstrate its superior performance in classification and survival analysis tasks, outperforming state-of-the-art methods. Systematic ablation studies confirm its robustness and strong interpretability. The code is released at https://github.com/ubc-tea/PTCMIL.

Related papers

Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective [52.662463893268225]
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios.<n>Existing SHGL methods encounter two significant limitations.<n>We introduce a novel framework enhanced by rank and dual consistency constraints.
arXiv Detail & Related papers (2024-12-01T09:33:20Z)
MSCPT: Few-shot Whole Slide Image Classification with Multi-scale and Context-focused Prompt Tuning [11.717352903130411]
Multiple instance learning has become a standard paradigm for the weakly supervised classification of whole slide images.<n>The lack of training data and the presence of rare diseases pose significant challenges for these methods.<n>We propose a Multi-Scale and Context-focused Prompt Tuning (MSCPT) method for the Few-shot Weakly Supervised WSI Classification task.
arXiv Detail & Related papers (2024-08-21T10:25:51Z)
cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process [23.266122629592807]
Multiple instance learning (MIL) has been extensively applied to whole slide histoparametric image (WSI) analysis. The existing aggregation strategy in MIL, which primarily relies on the first-order distance between instances, fails to accurately approximate the true feature distribution of each instance. We propose a new Bayesian nonparametric framework for multiple instance learning, which adopts a cascade of Dirichlet processes (cDP) to incorporate the instance-to-bag characteristic of the WSIs.
arXiv Detail & Related papers (2024-07-16T07:28:39Z)
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens [57.37893387775829]
We introduce a fast and balanced clustering method, named Semantic Equitable Clustering (SEC)<n>SEC clusters tokens based on their global semantic relevance in an efficient, straightforward manner.<n>We propose a versatile vision backbone, SECViT, to serve as a vision language connector.
arXiv Detail & Related papers (2024-05-22T04:49:00Z)
Multi-task learning via robust regularized clustering with non-convex group penalties [0.0]
Multi-task learning (MTL) aims to improve estimation performance by sharing common information among related tasks. Existing MTL methods based on this assumption often ignore outlier tasks. We propose a novel MTL method called MultiTask Regularized Clustering (MTLRRC)
arXiv Detail & Related papers (2024-04-04T07:09:43Z)
RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification [10.365234803533982]
We propose a retentive MIL method called RetMIL, which processes WSI sequences through hierarchical feature propagation structure. At the local level, the WSI sequence is divided into multiple subsequences. Tokens of each subsequence are updated through a parallel linear retention mechanism. At the global level, subsequences are fused into a global sequence, then updated through a serial retention mechanism, and finally the slide-level representation is obtained through a global attention pooling.
arXiv Detail & Related papers (2024-03-16T08:50:47Z)
S^2MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering [38.35594663863098]
Experimental results on six large-scale multi-view datasets demonstrate that S2MVTC significantly outperforms state-of-the-art algorithms in terms of clustering performance and CPU execution time.
arXiv Detail & Related papers (2024-03-14T05:00:29Z)
MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis. We represent each WSI as an undirected graph. To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z)
Scalable Incomplete Multi-View Clustering with Structure Alignment [71.62781659121092]
In this paper, we propose a novel incomplete anchor graph learning framework. We construct the view-specific anchor graph to capture the complementary information from different views. The time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples.
arXiv Detail & Related papers (2023-08-31T08:30:26Z)
HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition. The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations. We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.