An Information-theoretic Perspective of Hierarchical Clustering
- URL: http://arxiv.org/abs/2108.06036v1
- Date: Fri, 13 Aug 2021 03:03:56 GMT
- Title: An Information-theoretic Perspective of Hierarchical Clustering
- Authors: Yicheng Pan, Feng Zheng, Bingchen Fan
- Abstract summary: A cost function for hierarchical clustering was introduced by Dasgupta citedasgupta2016cost.
In this paper, we investigate hierarchical clustering from the emphinformation-theoretic perspective and formulate a new objective function.
- Score: 30.896561720088954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A combinatorial cost function for hierarchical clustering was introduced by
Dasgupta \cite{dasgupta2016cost}. It has been generalized by Cohen-Addad et al.
\cite{cohen2019hierarchical} to a general form named admissible function. In
this paper, we investigate hierarchical clustering from the
\emph{information-theoretic} perspective and formulate a new objective
function. We also establish the relationship between these two perspectives. In
algorithmic aspect, we get rid of the traditional top-down and bottom-up
frameworks, and propose a new one to stratify the \emph{sparsest} level of a
cluster tree recursively in guide with our objective function. For practical
use, our resulting cluster tree is not binary. Our algorithm called HCSE
outputs a $k$-level cluster tree by a novel and interpretable mechanism to
choose $k$ automatically without any hyper-parameter. Our experimental results
on synthetic datasets show that HCSE has a great advantage in finding the
intrinsic number of hierarchies, and the results on real datasets show that
HCSE also achieves competitive costs over the popular algorithms LOUVAIN and
HLP.
Related papers
- Hierarchical Clustering via Local Search [0.0]
We introduce a local search algorithm for hierarchical clustering.
We show that any locally optimal tree guarantees a revenue of at least $fracn-23sum_i jw(i,j)$ where is $n$ the number of objects and $w: [n] times [n] mathbbR+$ is the associated similarity function.
arXiv Detail & Related papers (2024-05-24T23:46:24Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs [7.616556723260849]
This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta's cost function.
For any input graph $G$ with a clear cluster-structure, our designed algorithms run in nearly-linear time in the input size of $G$.
We show that our designed algorithm produces comparable or better HC trees with much lower running time.
arXiv Detail & Related papers (2023-06-16T16:31:46Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs [3.2901541059183432]
We present two-time approximation algorithms for hierarchical clustering.
The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets.
arXiv Detail & Related papers (2021-12-16T17:52:04Z) - Learning Hierarchical Graph Neural Networks for Image Clustering [81.5841862489509]
We propose a hierarchical graph neural network (GNN) model that learns how to cluster a set of images into an unknown number of identities.
Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
arXiv Detail & Related papers (2021-07-03T01:28:42Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Data Structures & Algorithms for Exact Inference in Hierarchical
Clustering [41.24805506595378]
We present novel dynamic-programming algorithms for emphexact inference in hierarchical clustering based on a novel trellis data structure.
Our algorithms scale in time and space proportional to the powerset of $N$ elements which is super-exponentially more efficient than explicitly considering each of the (2N-3)!! possible hierarchies.
arXiv Detail & Related papers (2020-02-26T17:43:53Z) - Scalable Hierarchical Clustering with Tree Grafting [66.68869706310208]
Grinch is a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions.
Grinch is motivated by a new notion of separability for clustering with linkage functions.
arXiv Detail & Related papers (2019-12-31T20:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.