What training reveals about neural network complexity
- URL: http://arxiv.org/abs/2106.04186v1
- Date: Tue, 8 Jun 2021 08:58:00 GMT
- Title: What training reveals about neural network complexity
- Authors: Andreas Loukas, Marinos Poiitis, Stefanie Jegelka
- Abstract summary: This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training.
Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
- Score: 80.87515604428346
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work explores the hypothesis that the complexity of the function a deep
neural network (NN) is learning can be deduced by how fast its weights change
during training. Our analysis provides evidence for this supposition by
relating the network's distribution of Lipschitz constants (i.e., the norm of
the gradient at different regions of the input space) during different training
intervals with the behavior of the stochastic training procedure. We first
observe that the average Lipschitz constant close to the training data affects
various aspects of the parameter trajectory, with more complex networks having
a longer trajectory, bigger variance, and often veering further from their
initialization. We then show that NNs whose biases are trained more steadily
have bounded complexity even in regions of the input space that are far from
any training point. Finally, we find that steady training with Dropout implies
a training- and data-dependent generalization bound that grows
poly-logarithmically with the number of parameters. Overall, our results
support the hypothesis that good training behavior can be a useful bias towards
good generalization.
Related papers
- Bifurcations and loss jumps in RNN training [7.937801286897863]
We introduce a novel algorithm for detecting all fixed points and k-cycles in ReLU-based RNNs and their existence and stability regions.
Our algorithm provides exact results and returns fixed points and cycles up to high orders with surprisingly good scaling behavior.
arXiv Detail & Related papers (2023-10-26T16:49:44Z) - Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias
for Correlated Inputs [5.7166378791349315]
We prove that, for the fundamental regression task of learning a single neuron, training a one-hidden layer ReLU network converges to zero loss.
We also show and characterise a surprising distinction in this setting between interpolator networks of minimal rank and those of minimal Euclidean norm.
arXiv Detail & Related papers (2023-06-10T16:36:22Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - On the (Non-)Robustness of Two-Layer Neural Networks in Different
Learning Regimes [27.156666384752548]
Neural networks are highly sensitive to adversarial examples.
We study robustness and generalization in different scenarios.
We show how linearized lazy training regimes can worsen robustness.
arXiv Detail & Related papers (2022-03-22T16:40:52Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - More data or more parameters? Investigating the effect of data structure
on generalization [17.249712222764085]
Properties of data impact the test error as a function of the number of training examples and number of training parameters.
We show that noise in the labels and strong anisotropy of the input data play similar roles on the test error.
arXiv Detail & Related papers (2021-03-09T16:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.