Related papers: Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems

Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems

URL: http://arxiv.org/abs/2206.07082v3
Date: Tue, 18 Jul 2023 02:00:40 GMT
Title: Stability and Generalization of Stochastic Optimization with Nonconvex and Nonsmooth Problems
Authors: Yunwen Lei
Abstract summary: This paper introduces systematic algorithmic stability measures and the gap between the quantitative gradients and the population. We show how we apply these algorithms to achieve an implicit regular iteration step sizes and the adaptive gradient descent.
Score: 34.68590236021379
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stochastic optimization has found wide applications in minimizing objective functions in machine learning, which motivates a lot of theoretical studies to understand its practical success. Most of existing studies focus on the convergence of optimization errors, while the generalization analysis of stochastic optimization is much lagging behind. This is especially the case for nonconvex and nonsmooth problems often encountered in practice. In this paper, we initialize a systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems. We introduce novel algorithmic stability measures and establish their quantitative connection on the gap between population gradients and empirical gradients, which is then further extended to study the gap between the Moreau envelope of the empirical risk and that of the population risk. To our knowledge, these quantitative connection between stability and generalization in terms of either gradients or Moreau envelopes have not been studied in the literature. We introduce a class of sampling-determined algorithms, for which we develop bounds for three stability measures. Finally, we apply these discussions to derive error bounds for stochastic gradient descent and its adaptive variant, where we show how to achieve an implicit regularization by tuning the step sizes and the number of iterations.

Related papers

Towards Weaker Variance Assumptions for Stochastic Optimization [19.339358874690568]
We revisit a classical assumption for analyzing gradient algorithms where the squared norm of the subgradient is allowed to grow as fast as the squared norm of the optimization variable. We analyze convex problems with functional constraints or regularized convex-concave min-max problems. We obtain rates for optimality measures that do not require boundedness of the feasible set.
arXiv Detail & Related papers (2025-04-14T07:26:34Z)
Asymptotic and Non-Asymptotic Convergence of AdaGrad for Non-Convex Optimization via Novel Stopping Time-based Analysis [17.34603953600226]
We introduce an innovative comprehensive analysis of Ada, filling the existing gaps in the literature. We derive almost sure and mean-prodd in expectation for AdaGrad. In addition, we demonstrate the average non-a-bpt-d gradient measured by the rate-prod in expectation, which is potentially independent of the existing results.
arXiv Detail & Related papers (2024-09-08T08:29:51Z)
Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms [61.59448949684493]
We provide the stability and generalization analysis of compositional descent algorithms built from training examples. We establish the uniform stability results for two popular compositional gradient descent algorithms, namely SCGD and SCSC. We derive-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors.
arXiv Detail & Related papers (2023-07-07T02:40:09Z)
Accelerated stochastic approximation with state-dependent noise [7.4648480208501455]
We consider a class of smooth convex optimization problems under general assumptions on the quadratic noise in the gradient observation. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate.
arXiv Detail & Related papers (2023-07-04T06:06:10Z)
Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities. We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability. The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z)
Stability and Generalization for Markov Chain Stochastic Gradient Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems. We establish the optimal excess population risk bounds for both smooth and non-smooth cases. We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z)
Stability and Generalization of Stochastic Gradient Methods for Minimax Problems [71.60601421935844]
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs) We provide a comprehensive generalization analysis of examples from training gradient methods for minimax problems.
arXiv Detail & Related papers (2021-05-08T22:38:00Z)
Convergence Properties of Stochastic Hypergradients [38.64355126221992]
We study approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk on a large dataset. We provide numerical experiments to support our theoretical analysis and to show the advantage of using hypergradients in practice.
arXiv Detail & Related papers (2020-11-13T20:50:36Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.