Learning Trajectories are Generalization Indicators
- URL: http://arxiv.org/abs/2304.12579v4
- Date: Tue, 31 Oct 2023 07:30:34 GMT
- Title: Learning Trajectories are Generalization Indicators
- Authors: Jingwen Fu, Zhizheng Zhang, Dacheng Yin, Yan Lu, Nanning Zheng
- Abstract summary: This paper explores the connection between learning trajectories of Deep Neural Networks (DNNs) and their generalization capabilities.
We present a novel perspective for analyzing generalization error by investigating the contribution of each update step to the change in generalization error.
Our approach can also track changes in generalization error when adjustments are made to learning rates and label noise levels.
- Score: 44.53518627207067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores the connection between learning trajectories of Deep
Neural Networks (DNNs) and their generalization capabilities when optimized
using (stochastic) gradient descent algorithms. Instead of concentrating solely
on the generalization error of the DNN post-training, we present a novel
perspective for analyzing generalization error by investigating the
contribution of each update step to the change in generalization error. This
perspective allows for a more direct comprehension of how the learning
trajectory influences generalization error. Building upon this analysis, we
propose a new generalization bound that incorporates more extensive trajectory
information. Our proposed generalization bound depends on the complexity of
learning trajectory and the ratio between the bias and diversity of training
set. Experimental findings reveal that our method effectively captures the
generalization error throughout the training process. Furthermore, our approach
can also track changes in generalization error when adjustments are made to
learning rates and label noise levels. These results demonstrate that learning
trajectory information is a valuable indicator of a model's generalization
capabilities.
Related papers
- On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - Graph Out-of-Distribution Generalization via Causal Intervention [69.70137479660113]
We introduce a conceptually simple yet principled approach for training robust graph neural networks (GNNs) under node-level distribution shifts.
Our method resorts to a new learning objective derived from causal inference that coordinates an environment estimator and a mixture-of-expert GNN predictor.
Our model can effectively enhance generalization with various types of distribution shifts and yield up to 27.4% accuracy improvement over state-of-the-arts on graph OOD generalization benchmarks.
arXiv Detail & Related papers (2024-02-18T07:49:22Z) - Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class.
We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z) - FedGen: Generalizable Federated Learning for Sequential Data [8.784435748969806]
In many real-world distributed settings, spurious correlations exist due to biases and data sampling issues.
We present a generalizable federated learning framework called FedGen, which allows clients to identify and distinguish between spurious and invariant features.
We show that FedGen results in models that achieve significantly better generalization and can outperform the accuracy of current federated learning approaches by over 24%.
arXiv Detail & Related papers (2022-11-03T15:48:14Z) - Provable Generalization of Overparameterized Meta-learning Trained with
SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML)
We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds.
Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Double Descent and Other Interpolation Phenomena in GANs [2.7007335372861974]
We study the generalization error as a function of latent space dimension in generative adversarial networks (GANs)
We develop a novel pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples.
While our analysis focuses mostly on linear models, we also apply important insights for improving generalization of nonlinear, multilayer GANs.
arXiv Detail & Related papers (2021-06-07T23:07:57Z) - Learning While Dissipating Information: Understanding the Generalization
Capability of SGLD [9.328633662865682]
We derive an algorithm-dependent generalization bound by analyzing gradient Langevin dynamics (SGLD)
Our analysis reveals an intricate trade-off between learning and information dissipation.
arXiv Detail & Related papers (2021-02-05T03:18:52Z) - The Role of Mutual Information in Variational Classifiers [47.10478919049443]
We study the generalization error of classifiers relying on encodings trained on the cross-entropy loss.
We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information.
arXiv Detail & Related papers (2020-10-22T12:27:57Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.