Tighter Analysis of Alternating Stochastic Gradient Method for
Stochastic Nested Problems
- URL: http://arxiv.org/abs/2106.13781v1
- Date: Fri, 25 Jun 2021 17:33:51 GMT
- Title: Tighter Analysis of Alternating Stochastic Gradient Method for
Stochastic Nested Problems
- Authors: Tianyi Chen, Yuejiao Sun, and Wotao Yin
- Abstract summary: This paper unifies SGD-type updates for nested problems into a single approach that we term ALternating dEscenT (ALSET) method.
Under the new analysis, to achieve an $epsilon$-stationary point of the nested problem, it requires $cal O(epsilon-2)$ samples.
Applying our results to compositional, min-max and reinforcement learning problems either improves or matches the best-known sample complexity in the respective cases.
- Score: 31.02472517086767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic nested optimization, including stochastic compositional, min-max
and bilevel optimization, is gaining popularity in many machine learning
applications. While the three problems share the nested structure, existing
works often treat them separately, and thus develop problem-specific algorithms
and their analyses. Among various exciting developments, simple SGD-type
updates (potentially on multiple variables) are still prevalent in solving this
class of nested problems, but they are believed to have slower convergence rate
compared to that of the non-nested problems. This paper unifies several
SGD-type updates for stochastic nested problems into a single SGD approach that
we term ALternating Stochastic gradient dEscenT (ALSET) method. By leveraging
the hidden smoothness of the problem, this paper presents a tighter analysis of
ALSET for stochastic nested problems. Under the new analysis, to achieve an
$\epsilon$-stationary point of the nested problem, it requires ${\cal
O}(\epsilon^{-2})$ samples. Under certain regularity conditions, applying our
results to stochastic compositional, min-max and reinforcement learning
problems either improves or matches the best-known sample complexity in the
respective cases. Our results explain why simple SGD-type algorithms in
stochastic nested problems all work very well in practice without the need for
further modifications.
Related papers
- Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise [96.80184504268593]
gradient, clipping is one of the key algorithmic ingredients to derive good high-probability guarantees.
Clipping can spoil the convergence of the popular methods for composite and distributed optimization.
arXiv Detail & Related papers (2023-10-03T07:49:17Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel
Optimization [62.87181271021217]
We consider non-SBO problems that have many applications in machine learning.
This paper proposes fast randomized algorithms for non-SBO problems.
arXiv Detail & Related papers (2021-05-05T18:28:42Z) - Stochastic Multi-level Composition Optimization Algorithms with
Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions.
We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6)
This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z) - Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved
Complexities [21.13934071954103]
We present a deterministic algorithm for non-in one-text variable Descent strongly-concave in the other.
We show that under the SGC assumption, the complexities of the algorithms match that of existing algorithms.
Results are presented in terms of oracle-texttZO-GDMSA and Numerical experiments are presented to support theoretical results.
arXiv Detail & Related papers (2020-01-22T00:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.