On Penalty Methods for Nonconvex Bilevel Optimization and First-Order
Stochastic Approximation
- URL: http://arxiv.org/abs/2309.01753v2
- Date: Sun, 11 Feb 2024 08:10:01 GMT
- Title: On Penalty Methods for Nonconvex Bilevel Optimization and First-Order
Stochastic Approximation
- Authors: Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak
- Abstract summary: We show first-order algorithms for solving Bilevel Optimization problems.
In particular, we show a strong connection between the penalty function and the hyper-objective.
We show an improved oracle-complexity of $O(epsilon-3)$ and $O(epsilon-5)$, respectively.
- Score: 13.813242559935732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study first-order algorithms for solving Bilevel
Optimization (BO) where the objective functions are smooth but possibly
nonconvex in both levels and the variables are restricted to closed convex
sets. As a first step, we study the landscape of BO through the lens of penalty
methods, in which the upper- and lower-level objectives are combined in a
weighted sum with penalty parameter $\sigma > 0$. In particular, we establish a
strong connection between the penalty function and the hyper-objective by
explicitly characterizing the conditions under which the values and derivatives
of the two must be $O(\sigma)$-close. A by-product of our analysis is the
explicit formula for the gradient of hyper-objective when the lower-level
problem has multiple solutions under minimal conditions, which could be of
independent interest. Next, viewing the penalty formulation as
$O(\sigma)$-approximation of the original BO, we propose first-order algorithms
that find an $\epsilon$-stationary solution by optimizing the penalty
formulation with $\sigma = O(\epsilon)$. When the perturbed lower-level problem
uniformly satisfies the small-error proximal error-bound (EB) condition, we
propose a first-order algorithm that converges to an $\epsilon$-stationary
point of the penalty function, using in total $O(\epsilon^{-3})$ and
$O(\epsilon^{-7})$ accesses to first-order (stochastic) gradient oracles when
the oracle is deterministic and oracles are noisy, respectively. Under an
additional assumption on stochastic oracles, we show that the algorithm can be
implemented in a fully {\it single-loop} manner, i.e., with $O(1)$ samples per
iteration, and achieves the improved oracle-complexity of $O(\epsilon^{-3})$
and $O(\epsilon^{-5})$, respectively.
Related papers
- First-Order Methods for Linearly Constrained Bilevel Optimization [38.19659447295665]
We present first-order linearly constrained optimization methods for high-level Hessian computations.
For linear inequality constraints, we attain $(delta,epsilon)$-Goldstein stationarity in $widetildeO(ddelta-1 epsilon-3)$ gradient oracle calls.
arXiv Detail & Related papers (2024-06-18T16:41:21Z) - Projection-Free Methods for Stochastic Simple Bilevel Optimization with
Convex Lower-level Problem [16.9187409976238]
We study a class of convex bilevel optimization problems, also known as simple bilevel optimization.
We introduce novel bilevel optimization methods that approximate the solution set of the lower-level problem.
arXiv Detail & Related papers (2023-08-15T02:37:11Z) - A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - Projection-Free Algorithm for Stochastic Bi-level Optimization [17.759493152879013]
This work presents the first projection-free algorithm to solve bi-level optimization problems, where the objective function depends on another optimization problem.
The proposed $textbfStochastic $textbfF$rank-$textbfW$olfe ($textbfSCFW$) is shown to achieve a sample complexity of $mathcalO(epsilon-2)$ for convex objectives.
arXiv Detail & Related papers (2021-10-22T11:49:15Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.