Stability and Generalization of Stochastic Compositional Gradient
Descent Algorithms
- URL: http://arxiv.org/abs/2307.03357v2
- Date: Tue, 21 Nov 2023 22:05:37 GMT
- Title: Stability and Generalization of Stochastic Compositional Gradient
Descent Algorithms
- Authors: Ming Yang, Xiyuan Wei, Tianbao Yang, Yiming Ying
- Abstract summary: We provide the stability and generalization analysis of compositional descent algorithms built from training examples.
We establish the uniform stability results for two popular compositional gradient descent algorithms, namely SCGD and SCSC.
We derive-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors.
- Score: 61.59448949684493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many machine learning tasks can be formulated as a stochastic compositional
optimization (SCO) problem such as reinforcement learning, AUC maximization,
and meta-learning, where the objective function involves a nested composition
associated with an expectation. While a significant amount of studies has been
devoted to studying the convergence behavior of SCO algorithms, there is little
work on understanding their generalization, i.e., how these learning algorithms
built from training examples would behave on future test examples. In this
paper, we provide the stability and generalization analysis of stochastic
compositional gradient descent algorithms through the lens of algorithmic
stability in the framework of statistical learning theory. Firstly, we
introduce a stability concept called compositional uniform stability and
establish its quantitative relation with generalization for SCO problems. Then,
we establish the compositional uniform stability results for two popular
stochastic compositional gradient descent algorithms, namely SCGD and SCSC.
Finally, we derive dimension-independent excess risk bounds for SCGD and SCSC
by trade-offing their stability results and optimization errors. To the best of
our knowledge, these are the first-ever-known results on stability and
generalization analysis of stochastic compositional gradient descent
algorithms.
Related papers
- Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations [20.809499420384256]
STORM-based algorithms have been widely developed to solve one to $K$-level ($K geq 3$) optimization problems.
This paper provides a comprehensive analysis of three representative STORM-based algorithms.
arXiv Detail & Related papers (2024-07-07T07:07:04Z) - Stability-based Generalization Analysis for Mixtures of Pointwise and
Pairwise Learning [27.8712875561043]
Some algorithms of pointwise and pairwise learning (PPL) have been formulated by employing the hybrid error metric of "pointwise loss + pairwise loss"
In this paper, we try to fill this theoretical gap by investigating the generalization properties of PPL.
arXiv Detail & Related papers (2023-02-20T13:25:23Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Stability and Generalization for Markov Chain Stochastic Gradient
Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems.
We establish the optimal excess population risk bounds for both smooth and non-smooth cases.
We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z) - Stability and Generalization of Stochastic Optimization with Nonconvex
and Nonsmooth Problems [34.68590236021379]
This paper introduces systematic algorithmic stability measures and the gap between the quantitative gradients and the population.
We show how we apply these algorithms to achieve an implicit regular iteration step sizes and the adaptive gradient descent.
arXiv Detail & Related papers (2022-06-14T18:14:30Z) - Stability and Generalization for Randomized Coordinate Descent [19.687456295228156]
There is no work studying how the models trained by RCD would generalize to test examples.
In this paper, we initialize the generalization analysis of RCD by leveraging the powerful tool of algorithmic stability.
Our analysis shows that RCD enjoys better stability as compared to gradient descent.
arXiv Detail & Related papers (2021-08-17T02:52:50Z) - Solving Stochastic Compositional Optimization is Nearly as Easy as
Solving Stochastic Optimization [47.93365664380274]
This paper presents a newally Corrected Compositional gradient method (SCSC)
SCSC runs in a single-time scale with a single loop, uses a fixed batch size, and guarantees to converge at the same rate as the gradient descent (SGD) method for non-compositional optimization.
arXiv Detail & Related papers (2020-08-25T06:54:00Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.