Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel
Optimization
- URL: http://arxiv.org/abs/2105.02266v1
- Date: Wed, 5 May 2021 18:28:42 GMT
- Title: Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel
Optimization
- Authors: Zhishuai Guo, Tianbao Yang
- Abstract summary: We consider non-SBO problems that have many applications in machine learning.
This paper proposes fast randomized algorithms for non-SBO problems.
- Score: 62.87181271021217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider non-convex stochastic bilevel optimization (SBO)
problems that have many applications in machine learning. Although numerous
studies have proposed stochastic algorithms for solving these problems, they
are limited in two perspectives: (i) their sample complexities are high, which
do not match the state-of-the-art result for non-convex stochastic
optimization; (ii) their algorithms are tailored to problems with only one
lower-level problem. When there are many lower-level problems, it could be
prohibitive to process all these lower-level problems at each iteration. To
address these limitations, this paper proposes fast randomized stochastic
algorithms for non-convex SBO problems. First, we present a stochastic method
for non-convex SBO with only one lower problem and establish its sample
complexity of $O(1/\epsilon^3)$ for finding an $\epsilon$-stationary point
under appropriate conditions, matching the lower bound for stochastic smooth
non-convex optimization. Second, we present a randomized stochastic method for
non-convex SBO with $m>1$ lower level problems by processing only one lower
problem at each iteration, and establish its sample complexity no worse than
$O(m/\epsilon^3)$, which could have a better complexity than simply processing
all $m$ lower problems at each iteration. To the best of our knowledge, this is
the first work considering SBO with many lower level problems and establishing
state-of-the-art sample complexity.
Related papers
- Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization [25.438298531555468]
Bilevel optimization is widely applied in many machine learning tasks such as hyper learning, meta learning and reinforcement learning.
We propose an efficient Hessian/BiO method with the optimal convergence $frac1TT) under some mild conditions.
We conduct some some experiments on the bilevel game hyper-stationary numerical convergence.
arXiv Detail & Related papers (2024-07-25T07:25:06Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Tighter Analysis of Alternating Stochastic Gradient Method for
Stochastic Nested Problems [31.02472517086767]
This paper unifies SGD-type updates for nested problems into a single approach that we term ALternating dEscenT (ALSET) method.
Under the new analysis, to achieve an $epsilon$-stationary point of the nested problem, it requires $cal O(epsilon-2)$ samples.
Applying our results to compositional, min-max and reinforcement learning problems either improves or matches the best-known sample complexity in the respective cases.
arXiv Detail & Related papers (2021-06-25T17:33:51Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Distributed Stochastic Consensus Optimization with Momentum for
Nonconvex Nonsmooth Problems [45.88640334610308]
This paper presents a new distributed optimization algorithm for non-smooth problems.
We show that the proposed algorithm can achieve an overcal communication.
Experiments are presented to illustrate the effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-10T13:12:21Z) - Stochastic Multi-level Composition Optimization Algorithms with
Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions.
We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6)
This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved
Complexities [21.13934071954103]
We present a deterministic algorithm for non-in one-text variable Descent strongly-concave in the other.
We show that under the SGC assumption, the complexities of the algorithms match that of existing algorithms.
Results are presented in terms of oracle-texttZO-GDMSA and Numerical experiments are presented to support theoretical results.
arXiv Detail & Related papers (2020-01-22T00:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.