Near-optimal Algorithms for Explainable k-Medians and k-Means
- URL: http://arxiv.org/abs/2107.00798v1
- Date: Fri, 2 Jul 2021 02:07:12 GMT
- Title: Near-optimal Algorithms for Explainable k-Medians and k-Means
- Authors: Konstantin Makarychev, Liren Shan
- Abstract summary: We consider the problem of explainable $k$-medians and $k$-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian(ICML 2020)
Our goal is to find a emphoptimal decision tree that partitions data into $k clusters and minimizes the $k-medians or $k-means objective.
- Score: 12.68470213641421
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of explainable $k$-medians and $k$-means introduced
by Dasgupta, Frost, Moshkovitz, and Rashtchian~(ICML 2020). In this problem,
our goal is to find a \emph{threshold decision tree} that partitions data into
$k$ clusters and minimizes the $k$-medians or $k$-means objective. The obtained
clustering is easy to interpret because every decision node of a threshold tree
splits data based on a single feature into two groups. We propose a new
algorithm for this problem which is $\tilde O(\log k)$ competitive with
$k$-medians with $\ell_1$ norm and $\tilde O(k)$ competitive with $k$-means.
This is an improvement over the previous guarantees of $O(k)$ and $O(k^2)$ by
Dasgupta et al (2020). We also provide a new algorithm which is $O(\log^{3/2}
k)$ competitive for $k$-medians with $\ell_2$ norm. Our first algorithm is
near-optimal: Dasgupta et al (2020) showed a lower bound of $\Omega(\log k)$
for $k$-medians; in this work, we prove a lower bound of $\tilde\Omega(k)$ for
$k$-means. We also provide a lower bound of $\Omega(\log k)$ for $k$-medians
with $\ell_2$ norm.
Related papers
- A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile
Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems.
On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem.
On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - A Nearly Tight Analysis of Greedy k-means++ [1.452875650827562]
The $k$-means++ algorithm is known to return a $Theta(log k)$ approximate solution in expectation.
We present nearly matching lower and upper bounds for the greedy $k$-means++ algorithm.
arXiv Detail & Related papers (2022-07-16T13:49:07Z) - Approximating Fair Clustering with Cascaded Norm Objectives [10.69111036810888]
We find a clustering which minimizes the $ell_q$-norm of the vector over $W$ of the $ell_p$-norms of the weighted distances of points in $P$ from the centers.
This generalizes various clustering problems, including Socially Fair $k$-Median and $k$-Means.
arXiv Detail & Related papers (2021-11-08T20:18:10Z) - Almost Tight Approximation Algorithms for Explainable Clustering [16.22135057266913]
We study a recent framework of explainable clustering first suggested by Dasgupta et al.
Specifically, we focus on the $k$-means and $k$-medians problems and provide nearly tight upper and lower bounds.
arXiv Detail & Related papers (2021-07-01T23:49:23Z) - Nearly-Tight and Oblivious Algorithms for Explainable Clustering [8.071379672971542]
We study the problem of explainable clustering in the setting first formalized by Moshkovitz, Dasgupta, Rashtchian, and Frost (ICML 2020)
A $k$-clustering is said to be explainable if it is given by a decision tree where each internal node data points with a threshold cut in a single dimension (feature)
We give an algorithm that outputs an explainable clustering that loses at most a factor of $O(log2 k)$ compared to an optimal (not necessarily explainable) clustering for the $k$-medians objective.
arXiv Detail & Related papers (2021-06-30T15:49:41Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Adapting $k$-means algorithms for outliers [1.9290392443571387]
We show how to adapt several simple sampling-based algorithms for the $k$-means problem to the setting with outliers.
Our algorithms output $(varepsilon)z$ outliers while achieving an $O(varepsilon)$-approximation to the objective function.
arXiv Detail & Related papers (2020-07-02T14:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.