An Analysis of $D^\alpha$ seeding for $k$-means
- URL: http://arxiv.org/abs/2310.13474v1
- Date: Fri, 20 Oct 2023 13:15:18 GMT
- Title: An Analysis of $D^\alpha$ seeding for $k$-means
- Authors: Etienne Bamas, Sai Ganesh Nagarajan, Ola Svensson
- Abstract summary: We show that $alpha>2$ can indeed improve the $k$-means cost compared to $D2$ seeding.
We provide an experimental confirmation of the effects of the aforementioned parameters when using $Dalpha seeding.
- Score: 11.394909061094461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the most popular clustering algorithms is the celebrated $D^\alpha$
seeding algorithm (also know as $k$-means++ when $\alpha=2$) by Arthur and
Vassilvitskii (2007), who showed that it guarantees in expectation an
$O(2^{2\alpha}\cdot \log k)$-approximate solution to the ($k$,$\alpha$)-means
cost (where euclidean distances are raised to the power $\alpha$) for any
$\alpha\ge 1$. More recently, Balcan, Dick, and White (2018) observed
experimentally that using $D^\alpha$ seeding with $\alpha>2$ can lead to a
better solution with respect to the standard $k$-means objective (i.e. the
$(k,2)$-means cost).
In this paper, we provide a rigorous understanding of this phenomenon. For
any $\alpha>2$, we show that $D^\alpha$ seeding guarantees in expectation an
approximation factor of $$ O_\alpha \left((g_\alpha)^{2/\alpha}\cdot
\left(\frac{\sigma_{\mathrm{max}}}{\sigma_{\mathrm{min}}}\right)^{2-4/\alpha}\cdot
(\min\{\ell,\log k\})^{2/\alpha}\right)$$ with respect to the standard
$k$-means cost of any underlying clustering; where $g_\alpha$ is a parameter
capturing the concentration of the points in each cluster,
$\sigma_{\mathrm{max}}$ and $\sigma_{\mathrm{min}}$ are the maximum and minimum
standard deviation of the clusters around their means, and $\ell$ is the number
of distinct mixing weights in the underlying clustering (after rounding them to
the nearest power of $2$). We complement these results by some lower bounds
showing that the dependency on $g_\alpha$ and
$\sigma_{\mathrm{max}}/\sigma_{\mathrm{min}}$ is tight.
Finally, we provide an experimental confirmation of the effects of the
aforementioned parameters when using $D^\alpha$ seeding. Further, we
corroborate the observation that $\alpha>2$ can indeed improve the $k$-means
cost compared to $D^2$ seeding, and that this advantage remains even if we run
Lloyd's algorithm after the seeding.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Detection of Dense Subhypergraphs by Low-Degree Polynomials [72.4451045270967]
Detection of a planted dense subgraph in a random graph is a fundamental statistical and computational problem.
We consider detecting the presence of a planted $Gr(ngamma, n-alpha)$ subhypergraph in a $Gr(n, n-beta) hypergraph.
Our results are already new in the graph case $r=2$, as we consider the subtle log-density regime where hardness based on average-case reductions is not known.
arXiv Detail & Related papers (2023-04-17T10:38:08Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Tight FPT Approximation for Constrained k-Center and k-Supplier [0.38073142980733]
We study a range of constrained versions of the $k$-supplier and $k$-center problems.
A unified framework for constrained clustering was proposed by Ding and Xu [SODA 2015] in context of the $k$-median and $k$-means objectives.
arXiv Detail & Related papers (2021-10-27T07:52:59Z) - Nonasymptotic one-and two-sample tests in high dimension with unknown
covariance structure [0.0]
We consider the problem of testing if $mu is $eta-close to zero, i.e. $|mu| leq eta against $|mu| geq (eta + delta)$.
The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distancedelta$ such that we can control both the Type I and Type II errors at a given level.
arXiv Detail & Related papers (2021-09-01T06:22:53Z) - Near-optimal Algorithms for Explainable k-Medians and k-Means [12.68470213641421]
We consider the problem of explainable $k$-medians and $k$-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian(ICML 2020)
Our goal is to find a emphoptimal decision tree that partitions data into $k clusters and minimizes the $k-medians or $k-means objective.
arXiv Detail & Related papers (2021-07-02T02:07:12Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Learning Mixtures of Spherical Gaussians via Fourier Analysis [0.5381004207943596]
We find that a bound on the sample and computational complexity was previously unknown when $omega(1) leq d leq O(log k)$.
These authors also show that the sample of complexity of a random mixture of gaussians in a ball of radius $d$ in $d$ dimensions, when $d$ is $Theta(sqrtd)$ in $d$ dimensions, when $d$ is at least $poly(k, frac1delta)$.
arXiv Detail & Related papers (2020-04-13T08:06:29Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.