An objective function for order preserving hierarchical clustering
- URL: http://arxiv.org/abs/2109.04266v1
- Date: Thu, 9 Sep 2021 13:35:01 GMT
- Title: An objective function for order preserving hierarchical clustering
- Authors: Daniel Bakkelund
- Abstract summary: The model is an extension of the Dasgupta cost function for divisive hierarchical clustering.
Finding an optimal solution is NP-hard, so we provide a time approximation, with a relative performance guarantee of $O(log3/2n)$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an objective function for similarity based hierarchical clustering
of partially ordered data that preserves the partial order in the sense that if
$x \leq y$, and if $[x]$ and $[y]$ are the respective clusters of $x$ and $y$,
then there is an order relation $\leq'$ on the clusters for which $[x] \leq'
|y]$. The model distinguishes itself from existing methods and models for
clustering of ordered data in that the order relation and the similarity are
combined to obtain an optimal hierarchical clustering seeking to satisfy both,
and that the order relation is equipped with a pairwise level of comparability
in the range $[0,1]$. In particular, if the similarity and the order relation
are not aligned, then order preservation may have to yield in favor of
clustering. Finding an optimal solution is NP-hard, so we provide a polynomial
time approximation algorithm, with a relative performance guarantee of
$O(\log^{3/2}n)$, based on successive applications of directed sparsest cut.
The model is an extension of the Dasgupta cost function for divisive
hierarchical clustering.
Related papers
- Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Hierarchical Clustering: New Bounds and Objective [1.1295032417617457]
We introduce a new objective function for dissimilarity-based clustering.
For any tree $T$, let $H_i,j$ be the number of $i$ and $j$'s common ancestors.
arXiv Detail & Related papers (2021-11-12T18:32:23Z) - On the Generative Utility of Cyclic Conditionals [103.1624347008042]
We study whether and how can we model a joint distribution $p(x,z)$ using two conditional models $p(x|z)$ that form a cycle.
We propose the CyGen framework for cyclic-conditional generative modeling, including methods to enforce compatibility and use the determined distribution to fit and generate data.
arXiv Detail & Related papers (2021-06-30T10:23:45Z) - Near-Optimal Explainable $k$-Means for All Dimensions [13.673697350508965]
We show an efficient algorithm that finds an explainable clustering whose $k$-means cost is at most $k1 - 2/dmathrmpoly(dlog k)$.
We get an improved bound of $k1 - 2/dmathrmpolylog(k)$, which we show is optimal for every choice of $k,dge 2$ up to a poly-logarithmic factor in $k$.
arXiv Detail & Related papers (2021-06-29T16:59:03Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z) - Order preserving hierarchical agglomerative clustering [0.0]
We study the problem of order preserving hierarchical clustering of this kind of ordered data.
The clustering is similarity based and uses standard linkage functions, such as single- and complete linkage.
An optimal hierarchical clustering is defined as the partial dendrogram corresponding to the ultrametric closest to the original dissimilarity measure.
arXiv Detail & Related papers (2020-04-26T21:58:53Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z) - Query-Efficient Correlation Clustering [13.085439249887713]
Correlation clustering is arguably the most natural formulation of clustering.
A main drawback of correlation clustering is that it requires as input the $Theta(n2)$ pairwise similarities.
We devise a correlation clustering algorithm that attains a solution whose expected number of disagreements is at most $3cdot OPT + O(fracn3Q)$.
arXiv Detail & Related papers (2020-02-26T15:18:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.