Learning While Dissipating Information: Understanding the Generalization
Capability of SGLD
- URL: http://arxiv.org/abs/2102.02976v1
- Date: Fri, 5 Feb 2021 03:18:52 GMT
- Title: Learning While Dissipating Information: Understanding the Generalization
Capability of SGLD
- Authors: Hao Wang, Yizhe Huang, Rui Gao, Flavio P. Calmon
- Abstract summary: We derive an algorithm-dependent generalization bound by analyzing gradient Langevin dynamics (SGLD)
Our analysis reveals an intricate trade-off between learning and information dissipation.
- Score: 9.328633662865682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the generalization capability of learning algorithms is at the
heart of statistical learning theory. In this paper, we investigate the
generalization gap of stochastic gradient Langevin dynamics (SGLD), a widely
used optimizer for training deep neural networks (DNNs). We derive an
algorithm-dependent generalization bound by analyzing SGLD through an
information-theoretic lens. Our analysis reveals an intricate trade-off between
learning and information dissipation: SGLD learns from data by updating
parameters at each iteration while dissipating information from early training
stages. Our bound also involves the variance of gradients which captures a
particular kind of "sharpness" of the loss landscape. The main proof techniques
in this paper rely on strong data processing inequalities -- a fundamental
concept in information theory -- and Otto-Villani's HWI inequality. Finally, we
demonstrate our bound through numerical experiments, showing that it can
predict the behavior of the true generalization gap.
Related papers
- On the Generalization Capability of Temporal Graph Learning Algorithms:
Theoretical Insights and a Simpler Method [59.52204415829695]
Temporal Graph Learning (TGL) has become a prevalent technique across diverse real-world applications.
This paper investigates the generalization ability of different TGL algorithms.
We propose a simplified TGL network, which enjoys a small generalization error, improved overall performance, and lower model complexity.
arXiv Detail & Related papers (2024-02-26T08:22:22Z) - Exploring Causal Learning through Graph Neural Networks: An In-depth
Review [12.936700685252145]
We introduce a novel taxonomy that encompasses various state-of-the-art GNN methods employed in studying causality.
GNNs are further categorized based on their applications in the causality domain.
This review also touches upon the application of causal learning across diverse sectors.
arXiv Detail & Related papers (2023-11-25T10:46:06Z) - Understanding the Generalization Ability of Deep Learning Algorithms: A
Kernelized Renyi's Entropy Perspective [11.255943520955764]
We propose a novel information theoretical measure: kernelized Renyi's entropy.
We establish the generalization error bounds for gradient/Langevin descent (SGD/SGLD) learning algorithms under kernelized Renyi's entropy.
We show that our information-theoretical bounds depend on the statistics of the gradients, and are rigorously tighter than the current state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-05-02T01:17:15Z) - Learning Trajectories are Generalization Indicators [44.53518627207067]
This paper explores the connection between learning trajectories of Deep Neural Networks (DNNs) and their generalization capabilities.
We present a novel perspective for analyzing generalization error by investigating the contribution of each update step to the change in generalization error.
Our approach can also track changes in generalization error when adjustments are made to learning rates and label noise levels.
arXiv Detail & Related papers (2023-04-25T05:08:57Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Bounding Information Leakage in Machine Learning [26.64770573405079]
This paper investigates fundamental bounds on information leakage.
We identify and bound the success rate of the worst-case membership inference attack.
We derive bounds on the mutual information between the sensitive attributes and model parameters.
arXiv Detail & Related papers (2021-05-09T08:49:14Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.