Related papers: Stability of SGD: Tightness Analysis and Improved Bounds

Stability of SGD: Tightness Analysis and Improved Bounds

URL: http://arxiv.org/abs/2102.05274v1
Date: Wed, 10 Feb 2021 05:43:27 GMT
Title: Stability of SGD: Tightness Analysis and Improved Bounds
Authors: Yikai Zhang, Wenjia Zhang, Sammy Bald, Vamsi Pingali, Chao Chen, Mayank Goswami
Abstract summary: Gradient Descent (SGD) based methods have been widely used for training largescale machine learning models that generalize well in practice. This paper addresses the question: is analysis [18] tight for smooth functions, and if not, for what kind of loss and data can the analysis improved?
Score: 8.831597193643628
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability [18]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furthermore, apart from the properties of the loss function, data distribution has also been shown to be an important factor in generalization performance. This raises the question: is the stability analysis of [18] tight for smooth functions, and if not, for what kind of loss functions and data distributions can the stability analysis be improved? In this paper we first settle open questions regarding tightness of bounds in the data-independent setting: we show that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-convex loss functions. Next, we give a novel and improved data-dependent bounds: we show stability upper bounds for a large class of convex regularized loss functions, with negligible regularization parameters, and improve existing data-dependent bounds in the non-convex setting. We hope that our results will initiate further efforts to better understand the data-dependent setting under non-convex loss functions, leading to an improved understanding of the generalization abilities of deep networks.

Related papers

Consistency of augmentation graph and network approximability in contrastive learning [3.053989095162017]
We analyze the pointwise and spectral consistency of the augmentation graph Laplacian. We show that Laplacian converges to a weighted Laplace-Beltrami operator on the natural data manifold. These consistency results ensure that the graph Laplacian spectrum effectively captures the manifold geometry.
arXiv Detail & Related papers (2025-02-06T18:55:51Z)
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner. We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z)
A Precise Characterization of SGD Stability Using Loss Surface Geometry [8.942671556572073]
Descent Gradient (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding. Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates.
arXiv Detail & Related papers (2024-01-22T19:46:30Z)
Time-Independent Information-Theoretic Generalization Bounds for SGLD [4.73194777046253]
We provide novel information-theoretic generalization bounds for Langevin dynamics datasets. Our bounds are based on the assumptions of smoothness and dissipation, and are non-exponential.
arXiv Detail & Related papers (2023-11-02T07:42:23Z)
A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems. "vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
The Sobolev Regularization Effect of Stochastic Gradient Descent [8.193914488276468]
We show that flat minima regularize the gradient of the model function, which explains the good performance of flat minima. We also consider high-order moments of gradient noise, and show that Gradient Dascent (SGD) tends to impose constraints on these moments by a linear analysis of SGD around global minima.
arXiv Detail & Related papers (2021-05-27T21:49:21Z)
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation [17.199023009789308]
The Expected assumption of SGD (SGD) is being used routinely for non-artisan functions. In this paper, we show a paradigms for convergence to a smooth non-linear setting. We also provide theoretical guarantees for different step-size conditions.
arXiv Detail & Related papers (2020-06-18T07:05:56Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses. Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.