scGHSOM: Hierarchical clustering and visualization of single-cell and CRISPR data using growing hierarchical SOM
- URL: http://arxiv.org/abs/2407.16984v1
- Date: Wed, 24 Jul 2024 04:01:09 GMT
- Title: scGHSOM: Hierarchical clustering and visualization of single-cell and CRISPR data using growing hierarchical SOM
- Authors: Shang-Jung Wen, Jia-Ming Chang, Fang Yu,
- Abstract summary: We propose a comprehensive gene-cell dependency visualization via unsupervised clustering, Growing Hierarchical Self-Organizing Map (GHSOM)
GHSOM is applied to cluster samples in a hierarchical structure such that the self-growth structure of clusters satisfies the required variations between and within.
We present two innovative visualization tools: Cluster Feature Map and Cluster Distribution Map.
- Score: 0.8452349885923507
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: High-dimensional single-cell data poses significant challenges in identifying underlying biological patterns due to the complexity and heterogeneity of cellular states. We propose a comprehensive gene-cell dependency visualization via unsupervised clustering, Growing Hierarchical Self-Organizing Map (GHSOM), specifically designed for analyzing high-dimensional single-cell data like single-cell sequencing and CRISPR screens. GHSOM is applied to cluster samples in a hierarchical structure such that the self-growth structure of clusters satisfies the required variations between and within. We propose a novel Significant Attributes Identification Algorithm to identify features that distinguish clusters. This algorithm pinpoints attributes with minimal variation within a cluster but substantial variation between clusters. These key attributes can then be used for targeted data retrieval and downstream analysis. Furthermore, we present two innovative visualization tools: Cluster Feature Map and Cluster Distribution Map. The Cluster Feature Map highlights the distribution of specific features across the hierarchical structure of GHSOM clusters. This allows for rapid visual assessment of cluster uniqueness based on chosen features. The Cluster Distribution Map depicts leaf clusters as circles on the GHSOM grid, with circle size reflecting cluster data size and color customizable to visualize features like cell type or other attributes. We apply our analysis to three single-cell datasets and one CRISPR dataset (cell-gene database) and evaluate clustering methods with internal and external CH and ARI scores. GHSOM performs well, being the best performer in internal evaluation (CH=4.2). In external evaluation, GHSOM has the third-best performance of all methods.
Related papers
- Single-cell Multi-view Clustering via Community Detection with Unknown
Number of Clusters [64.31109141089598]
We introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data.
scUNC seamlessly integrates information from different views without the need for a predefined number of clusters.
We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets.
arXiv Detail & Related papers (2023-11-28T08:34:58Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection [19.066989850964756]
We introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI.
This algorithm avoids the burden of feature exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a discriminative clustering model.
Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.
arXiv Detail & Related papers (2023-02-07T10:52:04Z) - Enhancing cluster analysis via topological manifold learning [0.3823356975862006]
We show that inferring the topological structure of a dataset before clustering can considerably enhance cluster detection.
We combine manifold learning method UMAP for inferring the topological structure with density-based clustering method DBSCAN.
arXiv Detail & Related papers (2022-07-01T15:53:39Z) - Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC)
In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning.
For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z) - Attention-driven Graph Clustering Network [49.040136530379094]
We propose a novel deep clustering method named Attention-driven Graph Clustering Network (AGCN)
AGCN exploits a heterogeneous-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature.
AGCN can jointly perform feature learning and cluster assignment in an unsupervised fashion.
arXiv Detail & Related papers (2021-08-12T02:30:38Z) - Swarm Intelligence for Self-Organized Clustering [6.85316573653194]
A swarm system called Databionic swarm (DBS) is introduced which is able to adapt itself to structures of high-dimensional data.
By exploiting the interrelations of swarm intelligence, self-organization and emergence, DBS serves as an alternative approach to the optimization of a global objective function in the task of clustering.
arXiv Detail & Related papers (2021-06-10T06:21:48Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE [0.0]
Tree-SNE is a hierarchical clustering and visualization algorithm based on stacked one-dimensional t-SNE embeddings.
alpha-clustering recommends the optimal cluster assignment, without foreknowledge of the number of clusters.
We demonstrate the effectiveness of tree-SNE and alpha-clustering on images of handwritten digits, mass-CyTOF data from blood cells, and single-cell RNA-sequencing (scRNA-seq) data from retinal cells.
arXiv Detail & Related papers (2020-02-13T18:11:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.