Related papers: Network Degeneracy as an Indicator of Training Performance: Comparing Finite and Infinite Width Angle Predictions

Network Degeneracy as an Indicator of Training Performance: Comparing Finite and Infinite Width Angle Predictions

URL: http://arxiv.org/abs/2306.01513v1
Date: Fri, 2 Jun 2023 13:02:52 GMT
Title: Network Degeneracy as an Indicator of Training Performance: Comparing Finite and Infinite Width Angle Predictions
Authors: Cameron Jakub, Mihai Nica
Abstract summary: We show that as networks get deeper and deeper, they are more susceptible to becoming degenerate. We use a simple algorithm that can accurately predict the level of degeneracy for any given fully connected ReLU network architecture.
Score: 3.04585143845864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural networks are powerful functions with widespread use, but the theoretical behaviour of these functions is not fully understood. Creating deep neural networks by stacking many layers has achieved exceptional performance in many applications and contributed to the recent explosion of these methods. Previous works have shown that depth can exponentially increase the expressibility of the network. However, as networks get deeper and deeper, they are more susceptible to becoming degenerate. We observe this degeneracy in the sense that on initialization, inputs tend to become more and more correlated as they travel through the layers of the network. If a network has too many layers, it tends to approximate a (random) constant function, making it effectively incapable of distinguishing between inputs. This seems to affect the training of the network and cause it to perform poorly, as we empirically investigate in this paper. We use a simple algorithm that can accurately predict the level of degeneracy for any given fully connected ReLU network architecture, and demonstrate how the predicted degeneracy relates to training dynamics of the network. We also compare this prediction to predictions derived using infinite width networks.

Related papers

Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Feature-Learning Networks Are Consistent Across Widths At Realistic Scales [72.27228085606147]
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. We observe, however, that ensembles of narrower networks perform worse than a single wide network.
arXiv Detail & Related papers (2023-05-28T17:09:32Z)
Depth Degeneracy in Neural Networks: Vanishing Angles in Fully Connected ReLU Networks on Initialization [5.678271181959529]
We study the evolution of the angle between two inputs to a ReLU neural network as a function of the number of layers. We validate our theoretical results with Monte Carlo experiments and show that our results accurately approximate finite network behaviour. We also empirically investigate how the depth degeneracy phenomenon can negatively impact training of real networks.
arXiv Detail & Related papers (2023-02-20T01:30:27Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance. We explain how these effectively-deep networks learn nontrivial representations from training. We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics [0.5284812806199193]
We take inspiration from a popular framework in neuroscience: 'predictive coding' We show that implementing this strategy into two popular networks, VGG16 and EfficientNetB0, improves their robustness against various corruptions.
arXiv Detail & Related papers (2021-06-04T22:48:13Z)
The Connection Between Approximation, Depth Separation and Learnability in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z)
Prior knowledge distillation based on financial time series [0.8756822885568589]
We propose to use neural networks to represent indicators and train a large network constructed of smaller networks as feature layers. In numerical experiments, we find that our algorithm is faster and more accurate than traditional methods on real financial datasets.
arXiv Detail & Related papers (2020-06-16T15:26:06Z)
Why should we add early exits to neural networks? [16.793040797308105]
Deep neural networks are generally designed as a stack of differentiable layers, in which a prediction is obtained only after running the full stack. Some contributions have proposed techniques to endow the networks with early exits, allowing to obtain predictions at intermediate points of the stack. These multi-output networks have a number of advantages, including: (i) significant reductions of the inference time, (ii) reduced tendency to overfitting and vanishing gradients, and (iii) capability of being distributed over multi-tier platforms.
arXiv Detail & Related papers (2020-04-27T13:53:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.