A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization
- URL: http://arxiv.org/abs/2202.04296v1
- Date: Wed, 9 Feb 2022 06:05:38 GMT
- Title: A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization
- Authors: Tesi Xiao, Krishnakumar Balasubramanian, Saeed Ghadimi
- Abstract summary: We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
- Score: 12.096252285460814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a projection-free conditional gradient-type algorithm for smooth
stochastic multi-level composition optimization, where the objective function
is a nested composition of $T$ functions and the constraint set is a closed
convex set. Our algorithm assumes access to noisy evaluations of the functions
and their gradients, through a stochastic first-order oracle satisfying certain
standard unbiasedness and second moment assumptions. We show that the number of
calls to the stochastic first-order oracle and the linear-minimization oracle
required by the proposed algorithm, to obtain an $\epsilon$-stationary
solution, are of order $\mathcal{O}_T(\epsilon^{-2})$ and
$\mathcal{O}_T(\epsilon^{-3})$ respectively, where $\mathcal{O}_T$ hides
constants in $T$. Notably, the dependence of these complexity bounds on
$\epsilon$ and $T$ are separate in the sense that changing one does not impact
the dependence of the bounds on the other. Moreover, our algorithm is
parameter-free and does not require any (increasing) order of mini-batches to
converge unlike the common practice in the analysis of stochastic conditional
gradient-type algorithms.
Related papers
- Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization [2.0657831823662574]
When the non problem is by which the non problem is by whichity, the sample of first-order methods may depend linearly on the problem dimension, is for undesirable problems.
Our algorithms allow for the estimate of complexity using the distance of.
mathO (log d) / EuM4.
We prove that DISFOM can sharpen variance employing $mathO (log d) / EuM4.
arXiv Detail & Related papers (2024-06-27T18:38:42Z) - On Penalty Methods for Nonconvex Bilevel Optimization and First-Order
Stochastic Approximation [13.813242559935732]
We show first-order algorithms for solving Bilevel Optimization problems.
In particular, we show a strong connection between the penalty function and the hyper-objective.
We show an improved oracle-complexity of $O(epsilon-3)$ and $O(epsilon-5)$, respectively.
arXiv Detail & Related papers (2023-09-04T18:25:43Z) - An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - Extra-Newton: A First Approach to Noise-Adaptive Accelerated
Second-Order Methods [57.050204432302195]
This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions.
Our algorithm achieves $O(sigma / sqrtT)$ convergence when the oracle feedback is with variance $sigma2$, and improves its convergence to $O( 1 / T3)$ with deterministic oracles.
arXiv Detail & Related papers (2022-11-03T14:12:51Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Escaping Saddle-Points Faster under Interpolation-like Conditions [19.9471360853892]
We show that under over-parametrization several standard optimization algorithms escape saddle-points and converge to local-minimizers much faster.
We discuss the first-order oracle complexity of Perturbed Gradient Descent (PSGD) algorithm to reach an $epsilon$ localminimizer.
We next analyze Cubic-Regularized Newton (SCRN) algorithm under-like conditions, and show that the oracle complexity to reach an $epsilon$ local-minimizer under-like conditions, is $tildemathcalO (1/epsilon2.5
arXiv Detail & Related papers (2020-09-28T02:15:18Z) - Stochastic Multi-level Composition Optimization Algorithms with
Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions.
We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6)
This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Second-order Conditional Gradient Sliding [79.66739383117232]
We present the emphSecond-Order Conditional Gradient Sliding (SOCGS) algorithm.
The SOCGS algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations.
It is useful when the feasible region can only be accessed efficiently through a linear optimization oracle.
arXiv Detail & Related papers (2020-02-20T17:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.