Non-Adaptive Adaptive Sampling on Turnstile Streams
- URL: http://arxiv.org/abs/2004.10969v1
- Date: Thu, 23 Apr 2020 05:00:21 GMT
- Title: Non-Adaptive Adaptive Sampling on Turnstile Streams
- Authors: Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou
- Abstract summary: We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$.
Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
- Score: 57.619901304728366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adaptive sampling is a useful algorithmic tool for data summarization
problems in the classical centralized setting, where the entire dataset is
available to the single processor performing the computation. Adaptive sampling
repeatedly selects rows of an underlying matrix
$\mathbf{A}\in\mathbb{R}^{n\times d}$, where $n\gg d$, with probabilities
proportional to their distances to the subspace of the previously selected
rows. Intuitively, adaptive sampling seems to be limited to trivial multi-pass
algorithms in the streaming model of computation due to its inherently
sequential nature of assigning sampling probabilities to each row only after
the previous iteration is completed. Surprisingly, we show this is not the case
by giving the first one-pass algorithms for adaptive sampling on turnstile
streams and using space $\text{poly}(d,k,\log n)$, where $k$ is the number of
adaptive sampling rounds to be performed.
Our adaptive sampling procedure has a number of applications to various data
summarization problems that either improve state-of-the-art or have only been
previously studied in the more relaxed row-arrival model. We give the first
relative-error algorithms for column subset selection, subspace approximation,
projective clustering, and volume maximization on turnstile streams that use
space sublinear in $n$. We complement our volume maximization algorithmic
results with lower bounds that are tight up to lower order terms, even for
multi-pass algorithms. By a similar construction, we also obtain lower bounds
for volume maximization in the row-arrival model, which we match with
competitive upper bounds.
See paper for full abstract.
Related papers
- A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - Bayes-optimal learning of an extensive-width neural network from quadratically many samples [28.315491743569897]
We consider the problem of learning a target function corresponding to a single hidden layer neural network.
We consider the limit where the input dimension and the network width are proportionally large.
arXiv Detail & Related papers (2024-08-07T12:41:56Z) - Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data [17.657917523817243]
We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional optimization problem.
In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches.
We derive rates of convergence in expectation, that are of order $mathcalO(log T/T)$ and $mathcalO (1/T1-iota)$ for any $iota>0$.
arXiv Detail & Related papers (2024-05-29T19:21:55Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Towards Sample-Optimal Compressive Phase Retrieval with Sparse and
Generative Priors [59.33977545294148]
We show that $O(k log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function.
We adapt this result to sparse phase retrieval, and show that $O(s log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional.
arXiv Detail & Related papers (2021-06-29T12:49:54Z) - Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory [6.696267547013535]
We propose a novel distributed adaptive NN classifier for which the number of nearest neighbors is a tuning parameterally chosen by a data-driven criterion.
An early stopping rule is proposed when searching for the optimal tuning parameter, which improves the finite sample performance.
In particular, we show that when the sub-sample sizes are sufficiently large, the proposed classifier achieves the nearly optimal convergence rate.
arXiv Detail & Related papers (2021-05-20T14:38:41Z) - Optimal Sampling Gaps for Adaptive Submodular Maximization [28.24164217929491]
We study the performance loss caused by probability sampling in the context of adaptive submodular.
We show that the property of policywise submodular can be found in a wide range of real-world applications.
arXiv Detail & Related papers (2021-04-05T03:21:32Z) - Adaptive Sequential SAA for Solving Two-stage Stochastic Linear Programs [1.6181085766811525]
We present adaptive sequential SAA (sample average approximation) algorithms to solve large-scale two-stage linear programs.
The proposed algorithm can be stopped in finite-time to return a solution endowed with a probabilistic guarantee on quality.
arXiv Detail & Related papers (2020-12-07T14:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.