Cluster-Level Sparse Multi-Instance Learning for Whole-Slide Images
- URL: http://arxiv.org/abs/2509.11034v1
- Date: Sun, 14 Sep 2025 01:50:51 GMT
- Title: Cluster-Level Sparse Multi-Instance Learning for Whole-Slide Images
- Authors: Yuedi Zhang, Zhixiang Xia, Guosheng Yin, Bin Liu,
- Abstract summary: Cluster-level Sparse MIL (csMIL) is a novel framework that integrates global-local instance clustering, within-cluster attention, and cluster-level sparsity induction.<n>Our csMIL first performs global clustering across all bags to establish $K$ cluster centers, followed by local clustering within each bag to assign cluster labels.<n>This approach enhances robustness to noisy instances, improves interpretability by identifying critical regions, and reduces computational complexity.
- Score: 9.658549716966176
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-Instance Learning (MIL) is pivotal for analyzing complex, weakly labeled datasets, such as whole-slide images (WSIs) in computational pathology, where bags comprise unordered collections of instances with sparse diagnostic relevance. Traditional MIL approaches, including early statistical methods and recent attention-based frameworks, struggle with instance redundancy and lack explicit mechanisms for discarding non-informative instances, limiting their robustness and interpretability. We propose Cluster-level Sparse MIL (csMIL), a novel framework that integrates global-local instance clustering, within-cluster attention, and cluster-level sparsity induction to address these challenges. Our csMIL first performs global clustering across all bags to establish $K$ cluster centers, followed by local clustering within each bag to assign cluster labels. Attention scores are computed within each cluster, and sparse regularization is applied to cluster weights, enabling the selective retention of diagnostically relevant clusters while discarding irrelevant ones. This approach enhances robustness to noisy instances, improves interpretability by identifying critical regions, and reduces computational complexity. Theoretical analysis demonstrates that csMIL requires $O(s log K)$ bags to recover $s$ relevant clusters, aligning with compressed sensing principles. Empirically, csMIL achieves state-of-the-art performance on two public histopathology benchmarks (CAMELYON16, TCGA-NSCLC).
Related papers
- Scalable Context-Preserving Model-Aware Deep Clustering for Hyperspectral Images [51.95768218975529]
Subspace clustering has become widely adopted for the unsupervised analysis of hyperspectral images (HSIs)<n>Recent model-aware deep subspace clustering methods often use a two-stage framework, involving the calculation of a self-representation matrix with complexity of O(n2), followed by spectral clustering.<n>We propose a scalable, context-preserving deep clustering method based on basis representation, which jointly captures local and non-local structures for efficient HSI clustering.
arXiv Detail & Related papers (2025-06-12T16:43:09Z) - Graph Probability Aggregation Clustering [5.377020739388736]
We propose a graph-based fuzzy clustering algorithm that unifies the global clustering objective function with a local clustering constraint.<n>The entire GPAC framework is formulated as a multi-constrained optimization problem, which can be solved using the Lagrangian method.<n>Experiments conducted on synthetic, real-world, and deep learning datasets demonstrate that GPAC not only exceeds existing state-of-the-art methods in clustering performance but also excels in computational efficiency.
arXiv Detail & Related papers (2025-02-27T09:11:32Z) - Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective [52.662463893268225]
Self-supervised heterogeneous graph learning (SHGL) has shown promising potential in diverse scenarios.<n>Existing SHGL methods encounter two significant limitations.<n>We introduce a novel framework enhanced by rank and dual consistency constraints.
arXiv Detail & Related papers (2024-12-01T09:33:20Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Stable Cluster Discrimination for Deep Clustering [7.175082696240088]
Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution.
The coupled objective implies a trivial solution that all instances collapse to the uniform features.
In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering.
A novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly.
arXiv Detail & Related papers (2023-11-24T06:43:26Z) - Revisiting Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [85.51611950757643]
We propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.<n>IAC maintains an overall computational complexity of $ mathcalO(n, textpolylog(n) $, making it scalable and practical for large-scale problems.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - A Generalized Framework for Predictive Clustering and Optimization [18.06697544912383]
Clustering is a powerful and extensively used data science tool.
In this article, we define a generalized optimization framework for predictive clustering.
We also present a joint optimization strategy that exploits mixed-integer linear programming (MILP) for global optimization.
arXiv Detail & Related papers (2023-05-07T19:56:51Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial
Diffusion Geometry [9.619814126465206]
Clustering algorithms partition a dataset into groups of similar points.
The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm.
We show that incorporating spatial regularization into a multiscale clustering framework corresponds to smoother and more coherent clusters when applied to HSI data.
arXiv Detail & Related papers (2021-03-29T17:24:28Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Exact Recovery of Mangled Clusters with Same-Cluster Queries [20.03712152278538]
We study the cluster recovery problem in the semi-supervised active clustering framework.
We design an algorithm that, given $n$ points to be partitioned into $k$ clusters, uses $O(k3 ln k ln n)$ oracle queries and $tildeO(kn + k3)$ time to recover the clustering with zero misclassification error.
arXiv Detail & Related papers (2020-06-08T15:27:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.