An Objective for Hierarchical Clustering in Euclidean Space and its
Connection to Bisecting K-means
- URL: http://arxiv.org/abs/2008.13235v1
- Date: Sun, 30 Aug 2020 18:17:46 GMT
- Title: An Objective for Hierarchical Clustering in Euclidean Space and its
Connection to Bisecting K-means
- Authors: Benjamin Moseley, Yuyan Wang
- Abstract summary: Recently introduced objective for points with dissimilarity scores results in every tree being a 1/2 approximation if the distances form a metric.
Motivated by this, the paper develops a new global objective for hierarchical clustering in Euclidean space.
The paper builds a theoretical connection between this objective and the bisecting k-means algorithm.
- Score: 20.322355919030112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores hierarchical clustering in the case where pairs of points
have dissimilarity scores (e.g. distances) as a part of the input. The recently
introduced objective for points with dissimilarity scores results in every tree
being a 1/2 approximation if the distances form a metric. This shows the
objective does not make a significant distinction between a good and poor
hierarchical clustering in metric spaces. Motivated by this, the paper develops
a new global objective for hierarchical clustering in Euclidean space. The
objective captures the criterion that has motivated the use of divisive
clustering algorithms: that when a split happens, points in the same cluster
should be more similar than points in different clusters. Moreover, this
objective gives reasonable results on ground-truth inputs for hierarchical
clustering. The paper builds a theoretical connection between this objective
and the bisecting k-means algorithm. This paper proves that the optimal 2-means
solution results in a constant approximation for the objective. This is the
first paper to show the bisecting k-means algorithm optimizes a natural global
objective over the entire tree.
Related papers
- Accelerating k-Means Clustering with Cover Trees [0.30693357740321775]
We propose a new k-means algorithm based on the cover tree index, that has relatively low overhead and performs well.
We obtain a hybrid algorithm that combines the benefits of tree aggregation and bounds-based filtering.
arXiv Detail & Related papers (2024-10-19T14:02:42Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - ABCDE: Application-Based Cluster Diff Evals [49.1574468325115]
It aims to be practical: it allows items to have associated importance values that are application-specific, it is frugal in its use of human judgements when determining which clustering is better, and it can report metrics for arbitrary slices of items.
The approach to measuring the delta in the clustering quality is novel: instead of trying to construct an expensive ground truth up front and evaluating the each clustering with respect to that, ABCDE samples questions for judgement on the basis of the actual diffs between the clusterings.
arXiv Detail & Related papers (2024-07-31T08:29:35Z) - Linear time Evidence Accumulation Clustering with KMeans [0.0]
This work describes a trick which mimic the behavior of average linkage clustering.
We found a way of computing efficiently the density of a partitioning, reducing the cost from a quadratic to linear complexity.
The k-means results are comparable to the best state of the art in terms of NMI while keeping the computational cost low.
arXiv Detail & Related papers (2023-11-15T14:12:59Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - An objective function for order preserving hierarchical clustering [0.0]
We present a theory and an objective function for similarity-based hierarchical clustering of probabilistic partial orders.
Specifically, given elements $x le y$ in the partial order, and their respective clusters $[x]$ and $[y]$, the theory yields an order relation $le'$ on the clusters such that $[x]le'[y]$.
arXiv Detail & Related papers (2021-09-09T13:35:01Z) - Fair Clustering Under a Bounded Cost [33.50262066253557]
Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space.
A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness.
We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective.
arXiv Detail & Related papers (2021-06-14T08:47:36Z) - Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering.
We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure.
We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation.
The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z) - Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner.
We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost.
We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.