Generalization Bounds for Stochastic Gradient Langevin Dynamics: A
Unified View via Information Leakage Analysis
- URL: http://arxiv.org/abs/2112.08439v1
- Date: Tue, 14 Dec 2021 06:45:52 GMT
- Title: Generalization Bounds for Stochastic Gradient Langevin Dynamics: A
Unified View via Information Leakage Analysis
- Authors: Bingzhe Wu, Zhicong Liang, Yatao Bian, ChaoChao Chen, Junzhou Huang,
Yuan Yao
- Abstract summary: We present a unified generalization from privacy leakage analysis to investigate the bounds of SGLD.
We also conduct various numerical minimization to assess the information leakage issue SGLD.
- Score: 49.402932368689775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, generalization bounds of the non-convex empirical risk minimization
paradigm using Stochastic Gradient Langevin Dynamics (SGLD) have been
extensively studied. Several theoretical frameworks have been presented to
study this problem from different perspectives, such as information theory and
stability. In this paper, we present a unified view from privacy leakage
analysis to investigate the generalization bounds of SGLD, along with a
theoretical framework for re-deriving previous results in a succinct manner.
Aside from theoretical findings, we conduct various numerical studies to
empirically assess the information leakage issue of SGLD. Additionally, our
theoretical and empirical results provide explanations for prior works that
study the membership privacy of SGLD.
Related papers
- On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks [56.78271181959529]
Kolmogorov--Arnold Networks (KANs) have gained significant attention in the deep learning community.
Empirical investigations demonstrate that KANs optimized via gradient descent (SGD) are capable of achieving near-zero training loss.
arXiv Detail & Related papers (2024-10-10T15:34:10Z) - Understanding Forgetting in Continual Learning with Linear Regression [21.8755265936716]
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently.
We provide a general theoretical analysis of forgetting in the linear regression model via Gradient Descent.
We demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting.
arXiv Detail & Related papers (2024-05-27T18:33:37Z) - Understanding the Generalization Ability of Deep Learning Algorithms: A
Kernelized Renyi's Entropy Perspective [11.255943520955764]
We propose a novel information theoretical measure: kernelized Renyi's entropy.
We establish the generalization error bounds for gradient/Langevin descent (SGD/SGLD) learning algorithms under kernelized Renyi's entropy.
We show that our information-theoretical bounds depend on the statistics of the gradients, and are rigorously tighter than the current state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-05-02T01:17:15Z) - Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions [13.431453056203226]
Heavy-tail phenomena in Wasserstein descent (SGD) have been reported several empirical observations.
This paper develops bounds for generalization functions as well as general gradient functions.
Very recently, they shed more light to the empirical observations, thanks to the generality of the loss functions.
arXiv Detail & Related papers (2023-01-27T17:57:35Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Provable Generalization of Overparameterized Meta-learning Trained with
SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - Learning While Dissipating Information: Understanding the Generalization
Capability of SGLD [9.328633662865682]
We derive an algorithm-dependent generalization bound by analyzing gradient Langevin dynamics (SGLD)
Our analysis reveals an intricate trade-off between learning and information dissipation.
arXiv Detail & Related papers (2021-02-05T03:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.