An objective function for order preserving hierarchical clustering
- URL: http://arxiv.org/abs/2109.04266v1
- Date: Thu, 9 Sep 2021 13:35:01 GMT
- Title: An objective function for order preserving hierarchical clustering
- Authors: Daniel Bakkelund
- Abstract summary: The model is an extension of the Dasgupta cost function for divisive hierarchical clustering.
Finding an optimal solution is NP-hard, so we provide a time approximation, with a relative performance guarantee of $O(log3/2n)$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an objective function for similarity based hierarchical clustering
of partially ordered data that preserves the partial order in the sense that if
$x \leq y$, and if $[x]$ and $[y]$ are the respective clusters of $x$ and $y$,
then there is an order relation $\leq'$ on the clusters for which $[x] \leq'
|y]$. The model distinguishes itself from existing methods and models for
clustering of ordered data in that the order relation and the similarity are
combined to obtain an optimal hierarchical clustering seeking to satisfy both,
and that the order relation is equipped with a pairwise level of comparability
in the range $[0,1]$. In particular, if the similarity and the order relation
are not aligned, then order preservation may have to yield in favor of
clustering. Finding an optimal solution is NP-hard, so we provide a polynomial
time approximation algorithm, with a relative performance guarantee of
$O(\log^{3/2}n)$, based on successive applications of directed sparsest cut.
The model is an extension of the Dasgupta cost function for divisive
hierarchical clustering.
Related papers
- Hierarchical Clustering via Local Search [0.0]
We introduce a local search algorithm for hierarchical clustering.
We show that any locally optimal tree guarantees a revenue of at least $fracn-23sum_i jw(i,j)$ where is $n$ the number of objects and $w: [n] times [n] mathbbR+$ is the associated similarity function.
arXiv Detail & Related papers (2024-05-24T23:46:24Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - An Information-theoretic Perspective of Hierarchical Clustering [30.896561720088954]
A cost function for hierarchical clustering was introduced by Dasgupta citedasgupta2016cost.
In this paper, we investigate hierarchical clustering from the emphinformation-theoretic perspective and formulate a new objective function.
arXiv Detail & Related papers (2021-08-13T03:03:56Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - An Objective for Hierarchical Clustering in Euclidean Space and its
Connection to Bisecting K-means [20.322355919030112]
Recently introduced objective for points with dissimilarity scores results in every tree being a 1/2 approximation if the distances form a metric.
Motivated by this, the paper develops a new global objective for hierarchical clustering in Euclidean space.
The paper builds a theoretical connection between this objective and the bisecting k-means algorithm.
arXiv Detail & Related papers (2020-08-30T18:17:46Z) - Order preserving hierarchical agglomerative clustering [0.0]
We study the problem of order preserving hierarchical clustering of this kind of ordered data.
The clustering is similarity based and uses standard linkage functions, such as single- and complete linkage.
An optimal hierarchical clustering is defined as the partial dendrogram corresponding to the ultrametric closest to the original dissimilarity measure.
arXiv Detail & Related papers (2020-04-26T21:58:53Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z) - Scalable Hierarchical Clustering with Tree Grafting [66.68869706310208]
Grinch is a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions.
Grinch is motivated by a new notion of separability for clustering with linkage functions.
arXiv Detail & Related papers (2019-12-31T20:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.