Streaming Complexity of SVMs
- URL: http://arxiv.org/abs/2007.03633v1
- Date: Tue, 7 Jul 2020 17:10:00 GMT
- Title: Streaming Complexity of SVMs
- Authors: Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, David P.
Woodruff
- Abstract summary: We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
- Score: 110.63976030971106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the space complexity of solving the bias-regularized SVM problem in
the streaming model. This is a classic supervised learning problem that has
drawn lots of attention, including for developing fast algorithms for solving
the problem approximately. One of the most widely used algorithms for
approximately optimizing the SVM objective is Stochastic Gradient Descent
(SGD), which requires only $O(\frac{1}{\lambda\epsilon})$ random samples, and
which immediately yields a streaming algorithm that uses
$O(\frac{d}{\lambda\epsilon})$ space. For related problems, better streaming
algorithms are only known for smooth functions, unlike the SVM objective that
we focus on in this work. We initiate an investigation of the space complexity
for both finding an approximate optimum of this objective, and for the related
``point estimation'' problem of sketching the data set to evaluate the function
value $F_\lambda$ on any query $(\theta, b)$. We show that, for both problems,
for dimensions $d=1,2$, one can obtain streaming algorithms with space
polynomially smaller than $\frac{1}{\lambda\epsilon}$, which is the complexity
of SGD for strongly convex functions like the bias-regularized SVM, and which
is known to be tight in general, even for $d=1$. We also prove polynomial lower
bounds for both point estimation and optimization. In particular, for point
estimation we obtain a tight bound of $\Theta(1/\sqrt{\epsilon})$ for $d=1$ and
a nearly tight lower bound of $\widetilde{\Omega}(d/{\epsilon}^2)$ for $d =
\Omega( \log(1/\epsilon))$. Finally, for optimization, we prove a
$\Omega(1/\sqrt{\epsilon})$ lower bound for $d = \Omega( \log(1/\epsilon))$,
and show similar bounds when $d$ is constant.
Related papers
- Dueling Optimization with a Monotone Adversary [35.850072415395395]
We study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization.
The goal is to design an online algorithm to find a minimizer $mathbfx*$ for a function $fcolon X to mathbbRd.
arXiv Detail & Related papers (2023-11-18T23:55:59Z) - The Computational Complexity of Finding Stationary Points in Non-Convex Optimization [53.86485757442486]
Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-order but smooth objective functions is a computational problem.
We show that finding approximate KKT points in constrained optimization is reducible to finding approximate stationary points in unconstrained optimization but the converse is impossible.
arXiv Detail & Related papers (2023-10-13T14:52:46Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Near-Optimal Non-Convex Stochastic Optimization under Generalized
Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point.
However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications.
In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z) - Mind the gap: Achieving a super-Grover quantum speedup by jumping to the
end [114.3957763744719]
We present a quantum algorithm that has rigorous runtime guarantees for several families of binary optimization problems.
We show that the algorithm finds the optimal solution in time $O*(2(0.5-c)n)$ for an $n$-independent constant $c$.
We also show that for a large fraction of random instances from the $k$-spin model and for any fully satisfiable or slightly frustrated $k$-CSP formula, statement (a) is the case.
arXiv Detail & Related papers (2022-12-03T02:45:23Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Lower Bounds on the Worst-Case Complexity of Efficient Global
Optimization [11.523746174066702]
We derive a unified lower bound for the complexity of efficient global optimization in terms of the metric entropy of a ball in its corresponding reproducing kernel Hilbert space(RKHS)
We show that this lower bound nearly matches the upper bound attained by non-adaptive search algorithms for the commonly used squared exponential kernel and the Mat'ern kernel.
arXiv Detail & Related papers (2022-09-20T11:57:13Z) - Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax
Optimization [47.27237492375659]
We study the bilinearly coupled minimax problem: $min_x max_y f(x) + ytop A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions.
No known first-order algorithms have hitherto achieved the lower complexity bound of $Omega(sqrtfracL_xmu_x + frac|A|sqrtmu_x,mu_y) log(frac1vareps
arXiv Detail & Related papers (2022-01-19T05:56:19Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.