Fast and Interpretable Consensus Clustering via Minipatch Learning
- URL: http://arxiv.org/abs/2110.02388v1
- Date: Tue, 5 Oct 2021 22:39:28 GMT
- Title: Fast and Interpretable Consensus Clustering via Minipatch Learning
- Authors: Luqin Gan and Genevera I. Allen
- Abstract summary: We develop IMPACC: Interpretable MiniPatch Adaptive Consensus Clustering.
We develop adaptive sampling schemes for observations, which result in both improved reliability and computational savings.
Results show that our approach yields more accurate and interpretable cluster solutions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consensus clustering has been widely used in bioinformatics and other
applications to improve the accuracy, stability and reliability of clustering
results. This approach ensembles cluster co-occurrences from multiple
clustering runs on subsampled observations. For application to large-scale
bioinformatics data, such as to discover cell types from single-cell sequencing
data, for example, consensus clustering has two significant drawbacks: (i)
computational inefficiency due to repeatedly applying clustering algorithms,
and (ii) lack of interpretability into the important features for
differentiating clusters. In this paper, we address these two challenges by
developing IMPACC: Interpretable MiniPatch Adaptive Consensus Clustering. Our
approach adopts three major innovations. We ensemble cluster co-occurrences
from tiny subsets of both observations and features, termed minipatches, thus
dramatically reducing computation time. Additionally, we develop adaptive
sampling schemes for observations, which result in both improved reliability
and computational savings, as well as adaptive sampling schemes of features,
which leads to interpretable solutions by quickly learning the most relevant
features that differentiate clusters. We study our approach on synthetic data
and a variety of real large-scale bioinformatics data sets; results show that
our approach not only yields more accurate and interpretable cluster solutions,
but it also substantially improves computational efficiency compared to
standard consensus clustering approaches.
Related papers
- GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Deep Embedding Clustering Driven by Sample Stability [16.53706617383543]
We propose a deep embedding clustering algorithm driven by sample stability (DECS)
Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability.
The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.
arXiv Detail & Related papers (2024-01-29T09:19:49Z) - Stable Cluster Discrimination for Deep Clustering [7.175082696240088]
Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution.
The coupled objective implies a trivial solution that all instances collapse to the uniform features.
In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering.
A novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly.
arXiv Detail & Related papers (2023-11-24T06:43:26Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Clustering Optimisation Method for Highly Connected Biological Data [0.0]
We show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data.
The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data.
arXiv Detail & Related papers (2022-08-08T17:33:32Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Transductive Few-Shot Learning: Clustering is All You Need? [31.21306826132773]
We investigate a general formulation for transive few-shot learning, which integrates prototype-based objectives.
We find that our method yields competitive performances, in term of accuracy and optimization, while scaling up to large problems.
Surprisingly, we find that our general model already achieve competitive performances in comparison to the state-of-the-art learning.
arXiv Detail & Related papers (2021-06-16T16:14:01Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - reval: a Python package to determine best clustering solutions with
stability-based relative clustering validation [1.8129328638036126]
reval is a Python package that leverages stability-based relative clustering validation methods to determine best clustering solutions.
This work aims at developing a stability-based method that selects the best clustering solution as the one that replicates, via supervised learning, on unseen subsets of data.
arXiv Detail & Related papers (2020-08-27T10:36:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.