Confident Clustering via PCA Compression Ratio and Its Application to
Single-cell RNA-seq Analysis
- URL: http://arxiv.org/abs/2205.09849v1
- Date: Thu, 19 May 2022 20:46:49 GMT
- Title: Confident Clustering via PCA Compression Ratio and Its Application to
Single-cell RNA-seq Analysis
- Authors: Yingcong Li, Chandra Sekhar Mukherjee and Jiapeng Zhang
- Abstract summary: We develop a confident clustering method to diminish the influence of boundary datapoints.
We validate our algorithm on single-cell RNA-seq data.
Unlike traditional clustering methods in single-cell analysis, the confident clustering shows high stability under different choices of parameters.
- Score: 4.511561231517167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised clustering algorithms for vectors has been widely used in the
area of machine learning. Many applications, including the biological data we
studied in this paper, contain some boundary datapoints which show combination
properties of two underlying clusters and could lower the performance of the
traditional clustering algorithms. We develop a confident clustering method
aiming to diminish the influence of these datapoints and improve the clustering
results. Concretely, for a list of datapoints, we give two clustering results.
The first-round clustering attempts to classify only pure vectors with high
confidence. Based on it, we classify more vectors with less confidence in the
second round. We validate our algorithm on single-cell RNA-seq data, which is a
powerful and widely used tool in biology area. Our confident clustering shows a
high accuracy on our tested datasets. In addition, unlike traditional
clustering methods in single-cell analysis, the confident clustering shows high
stability under different choices of parameters.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Deep Clustering of Tabular Data by Weighted Gaussian Distribution Learning [0.0]
This paper develops one of the first deep clustering methods for tabular data: Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS)
The G-CEALS method presents average rank orderings of 2.9(1.7) and 2.8(1.7) based on clustering accuracy and adjusted Rand index (ARI) scores on sixteen data sets, respectively, and outperforms nine state-of-the-art clustering methods.
arXiv Detail & Related papers (2023-01-02T18:45:53Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Self-Evolutionary Clustering [1.662966122370634]
Most existing deep clustering methods are based on simple distance comparison and highly dependent on the target distribution generated by a handcrafted nonlinear mapping.
A novel modular Self-Evolutionary Clustering (Self-EvoC) framework is constructed, which boosts the clustering performance by classification in a self-supervised manner.
The framework can efficiently discriminate sample outliers and generate better target distribution with the assistance of self-supervised.
arXiv Detail & Related papers (2022-02-21T19:38:18Z) - Very Compact Clusters with Structural Regularization via Similarity and
Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets.
Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z) - Supervised Enhanced Soft Subspace Clustering (SESSC) for TSK Fuzzy
Classifiers [25.32478253796209]
Fuzzy c-means based clustering algorithms are frequently used for Takagi-Sugeno-Kang (TSK) fuzzy classifier parameter estimation.
This paper proposes a supervised enhanced soft subspace clustering (SESSC) algorithm, which considers simultaneously the within-cluster compactness, between-cluster separation, and label information in clustering.
arXiv Detail & Related papers (2020-02-27T19:39:19Z) - Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE [0.0]
Tree-SNE is a hierarchical clustering and visualization algorithm based on stacked one-dimensional t-SNE embeddings.
alpha-clustering recommends the optimal cluster assignment, without foreknowledge of the number of clusters.
We demonstrate the effectiveness of tree-SNE and alpha-clustering on images of handwritten digits, mass-CyTOF data from blood cells, and single-cell RNA-sequencing (scRNA-seq) data from retinal cells.
arXiv Detail & Related papers (2020-02-13T18:11:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.