Quantum (Inspired) $D^2$-sampling with Applications
- URL: http://arxiv.org/abs/2405.13351v1
- Date: Wed, 22 May 2024 05:13:28 GMT
- Title: Quantum (Inspired) $D^2$-sampling with Applications
- Authors: Ragesh Jaiswal, Poojan Shah,
- Abstract summary: We show a quantum implementation of $k$-means++ that runs in time $tildeO(zeta2 k2)$.
It can be shown through a robust approximation analysis of $k$-means++ that the quantum version preserves its $O(logk)$ approximation guarantee.
This results in a fast quantum-inspired classical implementation of $k$-means++, which we call QI-$k$-means++, with a running time $O(Nd) + tildeO(zeta
- Score: 0.138120109831448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: $D^2$-sampling is a fundamental component of sampling-based clustering algorithms such as $k$-means++. Given a dataset $V \subset \mathbb{R}^d$ with $N$ points and a center set $C \subset \mathbb{R}^d$, $D^2$-sampling refers to picking a point from $V$ where the sampling probability of a point is proportional to its squared distance from the nearest center in $C$. Starting with empty $C$ and iteratively $D^2$-sampling and updating $C$ in $k$ rounds is precisely $k$-means++ seeding that runs in $O(Nkd)$ time and gives $O(\log{k})$-approximation in expectation for the $k$-means problem. We give a quantum algorithm for (approximate) $D^2$-sampling in the QRAM model that results in a quantum implementation of $k$-means++ that runs in time $\tilde{O}(\zeta^2 k^2)$. Here $\zeta$ is the aspect ratio (i.e., largest to smallest interpoint distance), and $\tilde{O}$ hides polylogarithmic factors in $N, d, k$. It can be shown through a robust approximation analysis of $k$-means++ that the quantum version preserves its $O(\log{k})$ approximation guarantee. Further, we show that our quantum algorithm for $D^2$-sampling can be 'dequantized' using the sample-query access model of Tang (PhD Thesis, Ewin Tang, University of Washington, 2023). This results in a fast quantum-inspired classical implementation of $k$-means++, which we call QI-$k$-means++, with a running time $O(Nd) + \tilde{O}(\zeta^2k^2d)$, where the $O(Nd)$ term is for setting up the sample-query access data structure. Experimental investigations show promising results for QI-$k$-means++ on large datasets with bounded aspect ratio. Finally, we use our quantum $D^2$-sampling with the known $ D^2$-sampling-based classical approximation scheme (i.e., $(1+\varepsilon)$-approximation for any given $\varepsilon>0$) to obtain the first quantum approximation scheme for the $k$-means problem with polylogarithmic running time dependence on $N$.
Related papers
- Measuring quantum relative entropy with finite-size effect [53.64687146666141]
We study the estimation of relative entropy $D(rho|sigma)$ when $sigma$ is known.
Our estimator attains the Cram'er-Rao type bound when the dimension $d$ is fixed.
arXiv Detail & Related papers (2024-06-25T06:07:20Z) - Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - A Quantum Approximation Scheme for k-Means [0.16317061277457]
We give a quantum approximation scheme for the classical $k$-means clustering problem in the QRAM model.
Our quantum algorithm runs in time $tildeO left( 2tildeO(frackvarepsilon) eta2 dright)$.
Unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines.
arXiv Detail & Related papers (2023-08-16T06:46:37Z) - Efficient Sampling of Stochastic Differential Equations with Positive
Semi-Definite Models [91.22420505636006]
This paper deals with the problem of efficient sampling from a differential equation, given the drift function and the diffusion matrix.
It is possible to obtain independent and identically distributed (i.i.d.) samples at precision $varepsilon$ with a cost that is $m2 d log (1/varepsilon)$
Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
arXiv Detail & Related papers (2023-03-30T02:50:49Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Improved quantum data analysis [1.8416014644193066]
We give a quantum "Threshold Search" algorithm that requires only $O(log2 m)/epsilon2)$ samples of a $d$-dimensional state.
We also give an alternative Hypothesis Selection method using $tildeO((log3 m)/epsilon2)$ samples.
arXiv Detail & Related papers (2020-11-22T01:22:37Z) - Denoising modulo samples: k-NN regression and tightness of SDP
relaxation [5.025654873456756]
We derive a two-stage algorithm that recovers estimates of the samples $f(x_i)$ with a uniform error rate $O(fraclog nn)frac1d+2)$ holding with high probability.
The estimates of the samples $f(x_i)$ can be subsequently utilized to construct an estimate of the function $f$.
arXiv Detail & Related papers (2020-09-10T13:32:46Z) - The Quantum Supremacy Tsirelson Inequality [0.22843885788439797]
For a quantum circuit $C$ on $n$ qubits and a sample $z in 0,1n$, the benchmark involves computing $|langle z|C|0n rangle|2$.
We show that for any $varepsilon ge frac1mathrmpoly(n)$, outputting a sample $z$ is the optimal 1-query for $|langle z|C|0nrangle|2$ on average.
arXiv Detail & Related papers (2020-08-20T01:04:32Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.