Optimal Algorithms for Stochastic Multi-Level Compositional Optimization
- URL: http://arxiv.org/abs/2202.07530v2
- Date: Wed, 16 Feb 2022 15:50:06 GMT
- Title: Optimal Algorithms for Stochastic Multi-Level Compositional Optimization
- Authors: Wei Jiang, Bokun Wang, Yibo Wang, Lijun Zhang, Tianbao Yang
- Abstract summary: We solve the problem of multi-level compositional optimization, where the objective function is a limitation of multiple but possibly non-optimal functions.
To make use of the Adaptive Multi-level Variance Reduction method (SMVR), we also achieves the same optimal complexities but converges faster in practice.
- Score: 46.77664277596764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the problem of stochastic multi-level
compositional optimization, where the objective function is a composition of
multiple smooth but possibly non-convex functions. Existing methods for solving
this problem either suffer from sub-optimal sample complexities or need a huge
batch size. To address this limitation, we propose a Stochastic Multi-level
Variance Reduction method (SMVR), which achieves the optimal sample complexity
of $\mathcal{O}\left(1 / \epsilon^{3}\right)$ to find an $\epsilon$-stationary
point for non-convex objectives. Furthermore, when the objective function
satisfies the convexity or Polyak-{\L}ojasiewicz (PL) condition, we propose a
stage-wise variant of SMVR and improve the sample complexity to
$\mathcal{O}\left(1 / \epsilon^{2}\right)$ for convex functions or
$\mathcal{O}\left(1 /(\mu\epsilon)\right)$ for non-convex functions satisfying
the $\mu$-PL condition. The latter result implies the same complexity for
$\mu$-strongly convex functions. To make use of adaptive learning rates, we
also develop Adaptive SMVR, which achieves the same optimal complexities but
converges faster in practice. All our complexities match the lower bounds not
only in terms of $\epsilon$ but also in terms of $\mu$ (for PL or strongly
convex functions), without using a large batch size in each iteration.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization [16.709026203727007]
We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem.
We measure the performance of our method in terms of suboptimality and infeasibility errors.
arXiv Detail & Related papers (2024-02-12T22:34:53Z) - An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Parallel Quasi-concave set optimization: A new frontier that scales
without needing submodularity [14.93584434176082]
Class of quasi-concave set functions induced as a dual class to monotone linkage functions.
We show a potential for widespread applications via an example of diverse feature subset selection with exact global maxi-min guarantees.
arXiv Detail & Related papers (2021-08-19T15:50:41Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Stochastic Multi-level Composition Optimization Algorithms with
Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions.
We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6)
This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.