Related papers: An objective function for order preserving hierarchical clustering

An objective function for order preserving hierarchical clustering

URL: http://arxiv.org/abs/2109.04266v4
Date: Tue, 10 Dec 2024 18:31:50 GMT
Title: An objective function for order preserving hierarchical clustering
Authors: Daniel Bakkelund,
Abstract summary: We present a theory and an objective function for similarity-based hierarchical clustering of probabilistic partial orders.<n>Specifically, given elements $x le y$ in the partial order, and their respective clusters $[x]$ and $[y]$, the theory yields an order relation $le'$ on the clusters such that $[x]le'[y]$.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a theory and an objective function for similarity-based hierarchical clustering of probabilistic partial orders and directed acyclic graphs (DAGs). Specifically, given elements $x \le y$ in the partial order, and their respective clusters $[x]$ and $[y]$, the theory yields an order relation $\le'$ on the clusters such that $[x]\le'[y]$. The theory provides a concise definition of order-preserving hierarchical clustering, and offers a classification theorem identifying the order-preserving trees (dendrograms). To determine the optimal order-preserving trees, we develop an objective function that frames the problem as a bi-objective optimisation, aiming to satisfy both the order relation and the similarity measure. We prove that the optimal trees under the objective are both order-preserving and exhibit high-quality hierarchical clustering. Since finding an optimal solution is NP-hard, we introduce a polynomial-time approximation algorithm and demonstrate that the method outperforms existing methods for order-preserving hierarchical clustering by a significant margin.

Related papers

Hierarchical Clustering via Local Search [0.0]
We introduce a local search algorithm for hierarchical clustering. We show that any locally optimal tree guarantees a revenue of at least $fracn-23sum_i jw(i,j)$ where is $n$ the number of objects and $w: [n] times [n] mathbbR+$ is the associated similarity function.
arXiv Detail & Related papers (2024-05-24T23:46:24Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z)
Hierarchical Clustering: New Bounds and Objective [1.1295032417617457]
We introduce a new objective function for dissimilarity-based clustering. For any tree $T$, let $H_i,j$ be the number of $i$ and $j$'s common ancestors.
arXiv Detail & Related papers (2021-11-12T18:32:23Z)
An Information-theoretic Perspective of Hierarchical Clustering [30.896561720088954]
A cost function for hierarchical clustering was introduced by Dasgupta citedasgupta2016cost. In this paper, we investigate hierarchical clustering from the emphinformation-theoretic perspective and formulate a new objective function.
arXiv Detail & Related papers (2021-08-13T03:03:56Z)
On the Generative Utility of Cyclic Conditionals [103.1624347008042]
We study whether and how can we model a joint distribution $p(x,z)$ using two conditional models $p(x|z)$ that form a cycle. We propose the CyGen framework for cyclic-conditional generative modeling, including methods to enforce compatibility and use the determined distribution to fit and generate data.
arXiv Detail & Related papers (2021-06-30T10:23:45Z)
Near-Optimal Explainable $k$-Means for All Dimensions [13.673697350508965]
We show an efficient algorithm that finds an explainable clustering whose $k$-means cost is at most $k1 - 2/dmathrmpoly(dlog k)$. We get an improved bound of $k1 - 2/dmathrmpolylog(k)$, which we show is optimal for every choice of $k,dge 2$ up to a poly-logarithmic factor in $k$.
arXiv Detail & Related papers (2021-06-29T16:59:03Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering. We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Structured Graph Learning for Clustering and Semi-supervised Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data. Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)
An Objective for Hierarchical Clustering in Euclidean Space and its Connection to Bisecting K-means [20.322355919030112]
Recently introduced objective for points with dissimilarity scores results in every tree being a 1/2 approximation if the distances form a metric. Motivated by this, the paper develops a new global objective for hierarchical clustering in Euclidean space. The paper builds a theoretical connection between this objective and the bisecting k-means algorithm.
arXiv Detail & Related papers (2020-08-30T18:17:46Z)
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations [54.42518331209581]
We find an algorithm which finds. epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using. $(epsilon,gamma)$surimate random random points. Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)
Order preserving hierarchical agglomerative clustering [0.0]
We study the problem of order preserving hierarchical clustering of this kind of ordered data. The clustering is similarity based and uses standard linkage functions, such as single- and complete linkage. An optimal hierarchical clustering is defined as the partial dendrogram corresponding to the ultrametric closest to the original dissimilarity measure.
arXiv Detail & Related papers (2020-04-26T21:58:53Z)
Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner. We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost. We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
Query-Efficient Correlation Clustering [13.085439249887713]
Correlation clustering is arguably the most natural formulation of clustering. A main drawback of correlation clustering is that it requires as input the $Theta(n2)$ pairwise similarities. We devise a correlation clustering algorithm that attains a solution whose expected number of disagreements is at most $3cdot OPT + O(fracn3Q)$.
arXiv Detail & Related papers (2020-02-26T15:18:20Z)
Scalable Hierarchical Clustering with Tree Grafting [66.68869706310208]
Grinch is a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions. Grinch is motivated by a new notion of separability for clustering with linkage functions.
arXiv Detail & Related papers (2019-12-31T20:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.