Stability and Generalization of Stochastic Optimization with Nonconvex
and Nonsmooth Problems
- URL: http://arxiv.org/abs/2206.07082v3
- Date: Tue, 18 Jul 2023 02:00:40 GMT
- Title: Stability and Generalization of Stochastic Optimization with Nonconvex
and Nonsmooth Problems
- Authors: Yunwen Lei
- Abstract summary: This paper introduces systematic algorithmic stability measures and the gap between the quantitative gradients and the population.
We show how we apply these algorithms to achieve an implicit regular iteration step sizes and the adaptive gradient descent.
- Score: 34.68590236021379
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic optimization has found wide applications in minimizing objective
functions in machine learning, which motivates a lot of theoretical studies to
understand its practical success. Most of existing studies focus on the
convergence of optimization errors, while the generalization analysis of
stochastic optimization is much lagging behind. This is especially the case for
nonconvex and nonsmooth problems often encountered in practice. In this paper,
we initialize a systematic stability and generalization analysis of stochastic
optimization on nonconvex and nonsmooth problems. We introduce novel
algorithmic stability measures and establish their quantitative connection on
the gap between population gradients and empirical gradients, which is then
further extended to study the gap between the Moreau envelope of the empirical
risk and that of the population risk. To our knowledge, these quantitative
connection between stability and generalization in terms of either gradients or
Moreau envelopes have not been studied in the literature. We introduce a class
of sampling-determined algorithms, for which we develop bounds for three
stability measures. Finally, we apply these discussions to derive error bounds
for stochastic gradient descent and its adaptive variant, where we show how to
achieve an implicit regularization by tuning the step sizes and the number of
iterations.
Related papers
- Asymptotic and Non-Asymptotic Convergence Analysis of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis [17.34603953600226]
Adaptives have emerged as powerful tools in deep learning, dynamically adjusting the learning rate based on gradient.
These methods have significantly succeeded in various deep learning tasks, but AdaGrad is the cornerstone of this work.
arXiv Detail & Related papers (2024-09-08T08:29:51Z) - Stability and Generalization of Stochastic Compositional Gradient
Descent Algorithms [61.59448949684493]
We provide the stability and generalization analysis of compositional descent algorithms built from training examples.
We establish the uniform stability results for two popular compositional gradient descent algorithms, namely SCGD and SCSC.
We derive-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors.
arXiv Detail & Related papers (2023-07-07T02:40:09Z) - Accelerated stochastic approximation with state-dependent noise [7.4648480208501455]
We consider a class of smooth convex optimization problems under general assumptions on the quadratic noise in the gradient observation.
Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics.
We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate.
arXiv Detail & Related papers (2023-07-04T06:06:10Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Stability and Generalization for Markov Chain Stochastic Gradient
Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems.
We establish the optimal excess population risk bounds for both smooth and non-smooth cases.
We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z) - Stability and Generalization of Stochastic Gradient Methods for Minimax
Problems [71.60601421935844]
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs)
We provide a comprehensive generalization analysis of examples from training gradient methods for minimax problems.
arXiv Detail & Related papers (2021-05-08T22:38:00Z) - Convergence Properties of Stochastic Hypergradients [38.64355126221992]
We study approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk on a large dataset.
We provide numerical experiments to support our theoretical analysis and to show the advantage of using hypergradients in practice.
arXiv Detail & Related papers (2020-11-13T20:50:36Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.