Hierarchical clustering by aggregating representatives in
sub-minimum-spanning-trees
- URL: http://arxiv.org/abs/2111.06968v1
- Date: Thu, 11 Nov 2021 07:36:55 GMT
- Title: Hierarchical clustering by aggregating representatives in
sub-minimum-spanning-trees
- Authors: Wen-Bo Xie, Zhen Liu, Jaideep Srivastava
- Abstract summary: We propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point.
Under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data.
- Score: 5.877624540482919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the main challenges for hierarchical clustering is how to
appropriately identify the representative points in the lower level of the
cluster tree, which are going to be utilized as the roots in the higher level
of the cluster tree for further aggregation. However, conventional hierarchical
clustering approaches have adopted some simple tricks to select the
"representative" points which might not be as representative as enough. Thus,
the constructed cluster tree is less attractive in terms of its poor robustness
and weak reliability. Aiming at this issue, we propose a novel hierarchical
clustering algorithm, in which, while building the clustering dendrogram, we
can effectively detect the representative point based on scoring the reciprocal
nearest data points in each sub-minimum-spanning-tree. Extensive experiments on
UCI datasets show that the proposed algorithm is more accurate than other
benchmarks. Meanwhile, under our analysis, the proposed algorithm has O(nlogn)
time-complexity and O(logn) space-complexity, indicating that it has the
scalability in handling massive data with less time and storage consumptions.
Related papers
- Interpretable Sequence Clustering [3.280979689839737]
We propose a method called Interpretable Sequence Clustering Tree (ISCT)
ISCT generates k leaf nodes, corresponding to k clusters, which provides an intuitive explanation on how each cluster is formed.
Experimental results on 14 real-world data sets demonstrate that our proposed method provides an interpretable tree structure.
arXiv Detail & Related papers (2023-09-03T11:31:44Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Contrastive Hierarchical Clustering [8.068701201341065]
CoHiClust is a Contrastive Hierarchical Clustering model based on deep neural networks.
By employing a self-supervised learning approach, CoHiClust distills the base network into a binary tree without access to any labeled data.
arXiv Detail & Related papers (2023-03-03T07:54:19Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Natural Hierarchical Cluster Analysis by Nearest Neighbors with
Near-Linear Time Complexity [0.0]
We propose a nearest neighbor based clustering algorithm that results in a naturally defined hierarchy of clusters.
In contrast to the agglomerative and divisive hierarchical clustering algorithms, our approach is not dependent on the iterative working of the algorithm.
arXiv Detail & Related papers (2022-03-15T16:03:42Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.