Related papers: Fixed-sized clusters $k$-Means

Fixed-sized clusters $k$-Means

URL: http://arxiv.org/abs/2501.16113v1
Date: Mon, 27 Jan 2025 15:04:35 GMT
Title: Fixed-sized clusters $k$-Means
Authors: Mikko I. Malinen, Pasi Fränti,
Abstract summary: A straightforward application is balanced clustering, where the sizes of each cluster are equal.<n>In the $k$-means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm.
Score: 1.7034813545878587
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a $k$-means-based clustering algorithm, which optimizes the mean square error, for given cluster sizes. A straightforward application is balanced clustering, where the sizes of each cluster are equal. In the $k$-means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm. This makes the assignment phase time complexity $O(n^3)$. This enables clustering of datasets of size more than 5000 points.

Related papers

Generalization Performance of Ensemble Clustering: From Theory to Algorithm [57.176040163699554]
This paper focuses on generalization error, excess risk and consistency in ensemble clustering.<n>By assigning varying weights to finite clusterings, we minimize the error between the empirical average clusterings and their expectation.<n>We instantiate our theory to develop a new ensemble clustering algorithm.
arXiv Detail & Related papers (2025-06-01T09:34:52Z)
A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$. We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z)
Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets. We present an improved version of the "$q$-means" algorithm for clustering. We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z)
Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels. We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z)
Global $k$-means$++$: an effective relaxation of the global $k$-means clustering algorithm [0.20305676256390928]
The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed. We propose the emphglobal $k$-meanstexttt++ clustering algorithm, which is an effective way of acquiring quality clustering solutions.
arXiv Detail & Related papers (2022-11-22T13:42:53Z)
Explainable k-means. Don't be greedy, plant bigger trees! [12.68470213641421]
We provide a new bi-criteria $tildeO(log2 k)$ competitive algorithm for explainable $k$-means clustering. Explainable $k$-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020)
arXiv Detail & Related papers (2021-11-04T23:15:17Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering. We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z)
K-expectiles clustering [0.0]
We propose a novel partitioning clustering algorithm based on expectiles. We suggest two schemes: fixed $tau$ clustering, and adaptive $tau$ clustering.
arXiv Detail & Related papers (2021-03-16T21:14:56Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Ball k-means [53.89505717006118]
The Ball k-means algorithm uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
arXiv Detail & Related papers (2020-05-02T10:39:26Z)
Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner. We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost. We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)
Query-Efficient Correlation Clustering [13.085439249887713]
Correlation clustering is arguably the most natural formulation of clustering. A main drawback of correlation clustering is that it requires as input the $Theta(n2)$ pairwise similarities. We devise a correlation clustering algorithm that attains a solution whose expected number of disagreements is at most $3cdot OPT + O(fracn3Q)$.
arXiv Detail & Related papers (2020-02-26T15:18:20Z)
Efficient Algorithms for Generating Provably Near-Optimal Cluster Descriptors for Explainability [36.11663695534294]
We study the problem of making clusters more interpretable by extending a recent approach of constructing succinct representations for clusters. We develop approximation algorithms with provable performance guarantees for the problem. We also show applications to explain clusters from datasets, including clusters of genomic sequences that represent different threat levels.
arXiv Detail & Related papers (2020-02-06T19:49:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.