Strong Memory Lower Bounds for Learning Natural Models
- URL: http://arxiv.org/abs/2206.04743v1
- Date: Thu, 9 Jun 2022 19:35:47 GMT
- Title: Strong Memory Lower Bounds for Learning Natural Models
- Authors: Gavin Brown, Mark Bun, Adam Smith
- Abstract summary: We give lower bounds on the amount of memory required by one-pass streaming algorithms.
We show that algorithms which learn using a near-minimal number of examples, $tilde O(kappa)$, must use $tilde Omega( dkappa)$ bits of space.
- Score: 16.900376638975978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give lower bounds on the amount of memory required by one-pass streaming
algorithms for solving several natural learning problems. In a setting where
examples lie in $\{0,1\}^d$ and the optimal classifier can be encoded using
$\kappa$ bits, we show that algorithms which learn using a near-minimal number
of examples, $\tilde O(\kappa)$, must use $\tilde \Omega( d\kappa)$ bits of
space. Our space bounds match the dimension of the ambient space of the
problem's natural parametrization, even when it is quadratic in the size of
examples and the final classifier. For instance, in the setting of $d$-sparse
linear classifiers over degree-2 polynomial features, for which
$\kappa=\Theta(d\log d)$, our space lower bound is $\tilde\Omega(d^2)$. Our
bounds degrade gracefully with the stream length $N$, generally having the form
$\tilde\Omega\left(d\kappa \cdot \frac{\kappa}{N}\right)$.
Bounds of the form $\Omega(d\kappa)$ were known for learning parity and other
problems defined over finite fields. Bounds that apply in a narrow range of
sample sizes are also known for linear regression. Ours are the first such
bounds for problems of the type commonly seen in recent learning applications
that apply for a large range of input sizes.
Related papers
- Tight Time-Space Lower Bounds for Constant-Pass Learning [1.7387739961208148]
We prove tight memory-sample lower bounds for any parity learning algorithm that makes $q$ passes over the stream of samples.
We show that such a learner requires either $Omega(n2)$ memory size or at least $2Omega(n)$ samples.
Similar to prior work, our results extend to any learning problem with many nearly-orthogonal concepts.
arXiv Detail & Related papers (2023-10-12T06:36:31Z) - Optimal Approximation Rates for Deep ReLU Neural Networks on Sobolev and Besov Spaces [2.7195102129095003]
Deep neural networks with the ReLU activation function can approximate functions in the Sobolev spaces $Ws(L_q(Omega))$ and Besov spaces $Bs_r(L_q(Omega))$.
This problem is important when studying the application of neural networks in a variety of fields.
arXiv Detail & Related papers (2022-11-25T23:32:26Z) - The Approximate Degree of DNF and CNF Formulas [95.94432031144716]
For every $delta>0,$ we construct CNF and formulas of size with approximate degree $Omega(n1-delta),$ essentially matching the trivial upper bound of $n.
We show that for every $delta>0$, these models require $Omega(n1-delta)$, $Omega(n/4kk2)1-delta$, and $Omega(n/4kk2)1-delta$, respectively.
arXiv Detail & Related papers (2022-09-04T10:01:39Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Tractability from overparametrization: The example of the negative
perceptron [9.077560551700163]
We analyze a linear programming algorithm to characterize the corresponding threshold threshold $delta_textlin(kappa)$.
We observe a gap between the threshold $delta_textlin(kappa)$, raising question of the behavior of other algorithms.
arXiv Detail & Related papers (2021-10-28T01:00:13Z) - Learning Stochastic Shortest Path with Linear Function Approximation [74.08819218747341]
We study the shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models.
We propose a novel algorithm for learning the linear mixture SSP, which can attain a $tilde O(d B_star1.5sqrtK/c_min)$ regret.
arXiv Detail & Related papers (2021-10-25T08:34:00Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure.
We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds.
We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z) - Subspace Embeddings Under Nonlinear Transformations [19.28531602450729]
We consider embeddings that preserve the norm of all vectors in a space $S = y: y = f(x)text for x in Z$, where $Z$ is a $k$-dimensional subspace of $mathbbRn$ and $f(x)$ is a nonlinear activation function applied entrywise to $x$.
In particular, we give additive $epsilon$ error embeddings into $O(fracklog (n/epsilon)epsilon2)$ dimensions for
arXiv Detail & Related papers (2020-10-05T18:18:04Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Optimal Contextual Pricing and Extensions [32.152826900874075]
We give a poly-time algorithm for contextual pricing with $O(d log log T)$ regret which matches the $Omega(d log T)$ lower bound up to the $d log d$ additive factor.
These algorithms are based on a novel technique of bounding the value of the Steiner intrinsic of a convex region at various scales.
arXiv Detail & Related papers (2020-03-03T18:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.