Near-Optimal Quantum Coreset Construction Algorithms for Clustering
- URL: http://arxiv.org/abs/2306.02826v1
- Date: Mon, 5 Jun 2023 12:22:46 GMT
- Title: Near-Optimal Quantum Coreset Construction Algorithms for Clustering
- Authors: Yecheng Xue, Xiaoyu Chen, Tongyang Li, Shaofeng H.-C. Jiang
- Abstract summary: We give quantum algorithms that find coresets for $k$-clustering in $mathbbRd$ with $tildeO(sqrtnkd3/2)$ query complexity.
Our coreset reduces the input size from $n$ to $mathrmpoly(kepsilon-1d)$, so that existing $alpha$-approximation algorithms for clustering can run on top of it.
- Score: 15.513270929560088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: $k$-Clustering in $\mathbb{R}^d$ (e.g., $k$-median and $k$-means) is a
fundamental machine learning problem. While near-linear time approximation
algorithms were known in the classical setting for a dataset with cardinality
$n$, it remains open to find sublinear-time quantum algorithms. We give quantum
algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with
$\tilde{O}(\sqrt{nk}d^{3/2})$ query complexity. Our coreset reduces the input
size from $n$ to $\mathrm{poly}(k\epsilon^{-1}d)$, so that existing
$\alpha$-approximation algorithms for clustering can run on top of it and yield
$(1 + \epsilon)\alpha$-approximation. This eventually yields a quadratic
speedup for various $k$-clustering approximation algorithms. We complement our
algorithm with a nearly matching lower bound, that any quantum algorithm must
make $\Omega(\sqrt{nk})$ queries in order to achieve even $O(1)$-approximation
for $k$-clustering.
Related papers
- Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models [39.33814194788341]
We study the task of learning latent-variable models.
Motivated by such applications, we develop a general efficient algorithm for implicit moment computation.
By leveraging our general algorithm, we obtain the first-time learners for the following models.
arXiv Detail & Related papers (2024-11-23T23:13:24Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile
Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems.
On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem.
On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z) - Simple, Scalable and Effective Clustering via One-Dimensional
Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis.
We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$.
We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.