Scalable Hierarchical Clustering with Tree Grafting
- URL: http://arxiv.org/abs/2001.00076v1
- Date: Tue, 31 Dec 2019 20:56:15 GMT
- Title: Scalable Hierarchical Clustering with Tree Grafting
- Authors: Nicholas Monath, Ari Kobren, Akshay Krishnamurthy, Michael Glass,
Andrew McCallum
- Abstract summary: Grinch is a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions.
Grinch is motivated by a new notion of separability for clustering with linkage functions.
- Score: 66.68869706310208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical
clustering with general linkage functions that compute arbitrary similarity
between two point sets. The key components of Grinch are its rotate and graft
subroutines that efficiently reconfigure the hierarchy as new points arrive,
supporting discovery of clusters with complex structure. Grinch is motivated by
a new notion of separability for clustering with linkage functions: we prove
that when the model is consistent with a ground-truth clustering, Grinch is
guaranteed to produce a cluster tree containing the ground-truth, independent
of data arrival order. Our empirical results on benchmark and author
coreference datasets (with standard and learned linkage functions) show that
Grinch is more accurate than other scalable methods, and orders of magnitude
faster than hierarchical agglomerative clustering.
Related papers
- Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Community detection in complex networks via node similarity, graph
representation learning, and hierarchical clustering [4.264842058017711]
Community detection is a critical challenge in analysing real graphs.
This article proposes three new, general, hierarchical frameworks to deal with this task.
We compare over a hundred module combinations on the Block Model graphs and real-life datasets.
arXiv Detail & Related papers (2023-03-21T22:12:53Z) - Genie: A new, fast, and outlier-resistant hierarchical clustering
algorithm [3.7491936479803054]
We propose a new hierarchical clustering linkage criterion called Genie.
Our algorithm links two clusters in such a way that a chosen economic inequity measure does not drastically increase above a given threshold.
A reference implementation of the algorithm has been included in the open source 'genie' package for R.
arXiv Detail & Related papers (2022-09-13T06:42:53Z) - Graph Representation Learning via Contrasting Cluster Assignments [57.87743170674533]
We propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA.
It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning.
GRCCA has strong competitiveness in most tasks.
arXiv Detail & Related papers (2021-12-15T07:28:58Z) - Hierarchical clustering by aggregating representatives in
sub-minimum-spanning-trees [5.877624540482919]
We propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point.
Under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data.
arXiv Detail & Related papers (2021-11-11T07:36:55Z) - An Information-theoretic Perspective of Hierarchical Clustering [30.896561720088954]
A cost function for hierarchical clustering was introduced by Dasgupta citedasgupta2016cost.
In this paper, we investigate hierarchical clustering from the emphinformation-theoretic perspective and formulate a new objective function.
arXiv Detail & Related papers (2021-08-13T03:03:56Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Graph Neural Networks with Composite Kernels [60.81504431653264]
We re-interpret node aggregation from the perspective of kernel weighting.
We present a framework to consider feature similarity in an aggregation scheme.
We propose feature aggregation as the composition of the original neighbor-based kernel and a learnable kernel to encode feature similarities in a feature space.
arXiv Detail & Related papers (2020-05-16T04:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.