Online Lewis Weight Sampling
- URL: http://arxiv.org/abs/2207.08268v1
- Date: Sun, 17 Jul 2022 19:40:51 GMT
- Title: Online Lewis Weight Sampling
- Authors: David P. Woodruff, Taisuke Yasuda
- Abstract summary: Cohen and Peng introduced Lewis weight sampling to the theoretical computer science community.
Several works have extended this important primitive to other settings, including the online coreset, sliding window, and adversarial streaming models.
We design the first nearly optimal $ell_p$ subspace embeddings for all $pin(0,infty)$ in the online coreset, sliding window, and the adversarial streaming models.
- Score: 62.38157566916501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The seminal work of Cohen and Peng introduced Lewis weight sampling to the
theoretical computer science community, yielding fast row sampling algorithms
for approximating $d$-dimensional subspaces of $\ell_p$ up to $(1+\epsilon)$
error. Several works have extended this important primitive to other settings,
including the online coreset, sliding window, and adversarial streaming models.
However, these results are only for $p\in\{1,2\}$, and results for $p=1$
require a suboptimal $\tilde O(d^2/\epsilon^2)$ samples.
In this work, we design the first nearly optimal $\ell_p$ subspace embeddings
for all $p\in(0,\infty)$ in the online coreset, sliding window, and the
adversarial streaming models. In all three models, our algorithms store $\tilde
O(d^{1\lor(p/2)}/\epsilon^2)$ rows. This answers a substantial generalization
of the main open question of [BDMMUWZ2020], and gives the first results for all
$p\notin\{1,2\}$.
Towards our result, we give the first analysis of "one-shot'' Lewis weight
sampling of sampling rows proportionally to their Lewis weights, with sample
complexity $\tilde O(d^{p/2}/\epsilon^2)$ for $p>2$. Previously, this scheme
was only known to have sample complexity $\tilde O(d^{p/2}/\epsilon^5)$,
whereas $\tilde O(d^{p/2}/\epsilon^2)$ is known if a more sophisticated
recursive sampling is used. The recursive sampling cannot be implemented
online, thus necessitating an analysis of one-shot Lewis weight sampling. Our
analysis uses a novel connection to online numerical linear algebra.
As an application, we obtain the first one-pass streaming coreset algorithms
for $(1+\epsilon)$ approximation of important generalized linear models, such
as logistic regression and $p$-probit regression. Our upper bounds are
parameterized by a complexity parameter $\mu$ introduced by [MSSW2018], and we
show the first lower bounds showing that a linear dependence on $\mu$ is
necessary.
Related papers
- Private Algorithms for Stochastic Saddle Points and Variational Inequalities: Beyond Euclidean Geometry [20.068722738606287]
We study saddle point problems (SSP) and variational inequalities (SVI) under the constraint of $(epsilon,delta)$-differential privacy (DP)
We obtain a bound of $tildeObig(frac1sqrtn + fracsqrtdnepsilonbig)$ on the strong SP-gap, where $n$ is the number of samples and $d$ is the dimension.
We provide the first analysis which obtains a bound on the strong VI-gap of $
arXiv Detail & Related papers (2024-11-07T21:45:05Z) - Parallel Sampling via Counting [3.6854430278041264]
We show how to use parallelization to speed up sampling from an arbitrary distribution.
Our algorithm takes $O(n2/3cdot operatornamepolylog(n,q))$ parallel time.
Our results have implications for sampling in autoregressive models.
arXiv Detail & Related papers (2024-08-18T11:42:54Z) - Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream.
Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error.
For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z) - Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation [56.403488578703865]
We show that by augmenting the $ell_p$ sensitivities by $ell$ sensitivities, we obtain better bounds on optimal linear $tilde O(varepsilon-2mu2 d)$ sampling complexity.
We also obtain an $tilde O(varepsilon-2mu2 d)$ sensitivity sampling bound for logistic regression.
arXiv Detail & Related papers (2024-06-01T07:03:40Z) - Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly.
We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability.
Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - L1 Regression with Lewis Weights Subsampling [10.796388585158184]
We show that sampling from $X$ according to its Lewis weights and outputting the empirical minimizer succeeds with probability $1-delta$ for $m.
We also give a corresponding lower bound of $Omega(fracdvarepsilon2 + (d + frac1varepsilon2) logfrac1delta)$.
arXiv Detail & Related papers (2021-05-19T23:15:00Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.