Stability and Generalization of Stochastic Gradient Methods for Minimax
Problems
- URL: http://arxiv.org/abs/2105.03793v1
- Date: Sat, 8 May 2021 22:38:00 GMT
- Title: Stability and Generalization of Stochastic Gradient Methods for Minimax
Problems
- Authors: Yunwen Lei, Zhenhuan Yang, Tianbao Yang, Yiming Ying
- Abstract summary: Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs)
We provide a comprehensive generalization analysis of examples from training gradient methods for minimax problems.
- Score: 71.60601421935844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many machine learning problems can be formulated as minimax problems such as
Generative Adversarial Networks (GANs), AUC maximization and robust estimation,
to mention but a few. A substantial amount of studies are devoted to studying
the convergence behavior of their stochastic gradient-type algorithms. In
contrast, there is relatively little work on their generalization, i.e., how
the learning models built from training examples would behave on test examples.
In this paper, we provide a comprehensive generalization analysis of stochastic
gradient methods for minimax problems under both convex-concave and
nonconvex-nonconcave cases through the lens of algorithmic stability. We
establish a quantitative connection between stability and several
generalization measures both in expectation and with high probability. For the
convex-concave setting, our stability analysis shows that stochastic gradient
descent ascent attains optimal generalization bounds for both smooth and
nonsmooth minimax problems. We also establish generalization bounds for both
weakly-convex-weakly-concave and gradient-dominated problems.
Related papers
- Towards Sharper Risk Bounds for Minimax Problems [23.380477456114118]
Minimax problems have achieved success in machine learning such as adversarial, robust optimization, reinforcement learning.
For theoretical analysis, current optimal excess risk bounds are composed by generalization error and present 1/n-rates in strongly-strongly-concave (SC-SC)
We analyze some popular algorithms such as empirical saddle point (GDA), gradient descent (DA) and gradient descent (SG)
We derive n times faster than results in minimax problems.
arXiv Detail & Related papers (2024-10-11T03:50:23Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Stability and Generalization for Markov Chain Stochastic Gradient
Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems.
We establish the optimal excess population risk bounds for both smooth and non-smooth cases.
We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z) - Stability and Generalization of Stochastic Optimization with Nonconvex
and Nonsmooth Problems [34.68590236021379]
This paper introduces systematic algorithmic stability measures and the gap between the quantitative gradients and the population.
We show how we apply these algorithms to achieve an implicit regular iteration step sizes and the adaptive gradient descent.
arXiv Detail & Related papers (2022-06-14T18:14:30Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - On Distributed Non-convex Optimization: Projected Subgradient Method For
Weakly Convex Problems in Networks [13.385373310554327]
The Moreau subgradient method converges linear sharpness problems in machine learning.
A distributed implementation of the subgradient method with a theoretical guarantee is proposed.
arXiv Detail & Related papers (2020-04-28T01:01:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.