On the generalization of learning algorithms that do not converge
- URL: http://arxiv.org/abs/2208.07951v1
- Date: Tue, 16 Aug 2022 21:22:34 GMT
- Title: On the generalization of learning algorithms that do not converge
- Authors: Nisha Chandramoorthy, Andreas Loukas, Khashayar Gatmiry, Stefanie
Jegelka
- Abstract summary: Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
- Score: 54.122745736433856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization analyses of deep learning typically assume that the training
converges to a fixed point. But, recent results indicate that in practice, the
weights of deep neural networks optimized with stochastic gradient descent
often oscillate indefinitely. To reduce this discrepancy between theory and
practice, this paper focuses on the generalization of neural networks whose
training dynamics do not necessarily converge to fixed points. Our main
contribution is to propose a notion of statistical algorithmic stability (SAS)
that extends classical algorithmic stability to non-convergent algorithms and
to study its connection to generalization. This ergodic-theoretic approach
leads to new insights when compared to the traditional optimization and
learning theory perspectives. We prove that the stability of the
time-asymptotic behavior of a learning algorithm relates to its generalization
and empirically demonstrate how loss dynamics can provide clues to
generalization performance. Our findings provide evidence that networks that
"train stably generalize better" even when the training continues indefinitely
and the weights do not converge.
Related papers
- Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization [22.577751005038543]
Federated Learning (FL) is a distributed learning approach that trains neural networks across multiple devices.
FL often faces challenges due to data heterogeneity, leading to inconsistent local optima among clients.
We introduce the first generalization dynamics analysis framework in federated optimization.
arXiv Detail & Related papers (2024-11-25T11:43:22Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Learning Non-Vacuous Generalization Bounds from Optimization [8.294831479902658]
We present a simple yet non-vacuous generalization bound from the optimization perspective.
We achieve this goal by leveraging that the hypothesis set accessed by gradient algorithms is essentially fractal-like.
Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks.
arXiv Detail & Related papers (2022-06-09T08:59:46Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Strong overall error analysis for the training of artificial neural
networks via random initializations [3.198144010381572]
We show that the depth of the neural network only needs to increase much slower in order to obtain the same rate of approximation.
Results hold in the case of an arbitrary optimization algorithm with i.i.d. random initializations.
arXiv Detail & Related papers (2020-12-15T17:34:16Z) - Learning Fast Approximations of Sparse Nonlinear Regression [50.00693981886832]
In this work, we bridge the gap by introducing the Threshold Learned Iterative Shrinkage Algorithming (NLISTA)
Experiments on synthetic data corroborate our theoretical results and show our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-26T11:31:08Z) - Optimization and Generalization of Regularization-Based Continual
Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task.
Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.