Related papers: Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method

Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method

URL: http://arxiv.org/abs/2010.01848v1
Date: Mon, 5 Oct 2020 08:16:56 GMT
Title: Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method
Authors: Kiran Koshy Thekumparampil, Prateek Jain, Praneeth Netrapalli, Sewoong Oh
Abstract summary: We find a feasible $epsilon$-suboptimal solution using only $O(epsilon-1)$ PO calls and optimal $O(epsilon-2)$ FO calls. Our experiments confirm that these methods achieve significant speedups over the state-of-the-art, for a problem with costly PO and LMO calls.
Score: 54.93433440034386
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the classical setting of optimizing a nonsmooth Lipschitz continuous convex function over a convex constraint set, when having access to a (stochastic) first-order oracle (FO) for the function and a projection oracle (PO) for the constraint set. It is well known that to achieve $\epsilon$-suboptimality in high-dimensions, $\Theta(\epsilon^{-2})$ FO calls are necessary. This is achieved by the projected subgradient method (PGD). However, PGD also entails $O(\epsilon^{-2})$ PO calls, which may be computationally costlier than FO calls (e.g. nuclear norm constraints). Improving this PO calls complexity of PGD is largely unexplored, despite the fundamental nature of this problem and extensive literature. We present first such improvement. This only requires a mild assumption that the objective function, when extended to a slightly larger neighborhood of the constraint set, still remains Lipschitz and accessible via FO. In particular, we introduce MOPES method, which carefully combines Moreau-Yosida smoothing and accelerated first-order schemes. This is guaranteed to find a feasible $\epsilon$-suboptimal solution using only $O(\epsilon^{-1})$ PO calls and optimal $O(\epsilon^{-2})$ FO calls. Further, instead of a PO if we only have a linear minimization oracle (LMO, a la Frank-Wolfe) to access the constraint set, an extension of our method, MOLES, finds a feasible $\epsilon$-suboptimal solution using $O(\epsilon^{-2})$ LMO calls and FO calls---both match known lower bounds, resolving a question left open since White (1993). Our experiments confirm that these methods achieve significant speedups over the state-of-the-art, for a problem with costly PO and LMO calls.

Related papers

First-Order Methods for Linearly Constrained Bilevel Optimization [38.19659447295665]
We present first-order linearly constrained optimization methods for high-level Hessian computations. For linear inequality constraints, we attain $(delta,epsilon)$-Goldstein stationarity in $widetildeO(ddelta-1 epsilon-3)$ gradient oracle calls.
arXiv Detail & Related papers (2024-06-18T16:41:21Z)
On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation [13.813242559935732]
We show first-order algorithms for solving Bilevel Optimization problems. In particular, we show a strong connection between the penalty function and the hyper-objective. We show an improved oracle-complexity of $O(epsilon-3)$ and $O(epsilon-5)$, respectively.
arXiv Detail & Related papers (2023-09-04T18:25:43Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Perseus: A Simple and Optimal High-Order Method for Variational Inequalities [81.32967242727152]
A VI involves finding $xstar in mathcalX$ such that $langle F(x), x - xstarrangle geq 0$ for all $x in mathcalX$. We propose a $pth$-order method that does textitnot require any line search procedure and provably converges to a weak solution at a rate of $O(epsilon-2/(p+1))$.
arXiv Detail & Related papers (2022-05-06T13:29:14Z)
Generalized Optimistic Methods for Convex-Concave Saddle Point Problems [24.5327016306566]
The optimistic method has seen increasing popularity for solving convex-concave saddle point problems. We develop a backtracking line search scheme to select the step sizes without knowledge of coefficients.
arXiv Detail & Related papers (2022-02-19T20:31:05Z)
Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax Optimization [47.27237492375659]
We study the bilinearly coupled minimax problem: $min_x max_y f(x) + ytop A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions. No known first-order algorithms have hitherto achieved the lower complexity bound of $Omega(sqrtfracL_xmu_x + frac|A|sqrtmu_x,mu_y) log(frac1vareps
arXiv Detail & Related papers (2022-01-19T05:56:19Z)
Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness [26.71898403195793]
We study the complexity of optimizing highly smooth convex functions. For a positive integer $p$, we want to find an $epsilon$-approximate minimum of a convex function $f$. We prove a new lower bound that matches this bound (up to log factors), and holds not only for randomized algorithms, but also for quantum algorithms.
arXiv Detail & Related papers (2021-12-02T10:51:43Z)
Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss [41.17536985461902]
We prove an oracle complexity lower bound scaling as $Omega(Nepsilon-2/3)$, showing that our dependence on $N$ is optimal up to polylogarithmic factors. We develop methods with improved complexity bounds of $tildeO(Nepsilon-2/3 + sqrtNepsilon-8/3)$ in the non-smooth case and $tildeO(Nepsilon-2/3 + sqrtNepsilon-1)$ in
arXiv Detail & Related papers (2021-05-04T21:49:15Z)
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations [54.42518331209581]
We find an algorithm which finds. epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using. $(epsilon,gamma)$surimate random random points. Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)
Safe Learning under Uncertain Objectives and Constraints [66.05180398174286]
We consider non-textitunknown yet safety-critical optimization problems under textitunknown yet safety-critical constraints. Such problems naturally arise in a variety of domains including robotics, manufacturing, and medical procedures. A crucial component of our analysis is to introduce and apply a technique called shrinkage in the context of safe optimization.
arXiv Detail & Related papers (2020-06-23T20:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.