Disentangling Trainability and Generalization in Deep Neural Networks
- URL: http://arxiv.org/abs/1912.13053v2
- Date: Mon, 13 Jul 2020 04:55:53 GMT
- Title: Disentangling Trainability and Generalization in Deep Neural Networks
- Authors: Lechao Xiao, Jeffrey Pennington, Samuel S. Schoenholz
- Abstract summary: We analyze the spectrum of the Neural Tangent Kernel (NTK) for trainability and generalization across a range of networks.
We find that CNNs without global average pooling behave almost identically to FCNs, but that CNNs with pooling have markedly different and often better generalization performance.
- Score: 45.15453323967438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A longstanding goal in the theory of deep learning is to characterize the
conditions under which a given neural network architecture will be trainable,
and if so, how well it might generalize to unseen data. In this work, we
provide such a characterization in the limit of very wide and very deep
networks, for which the analysis simplifies considerably. For wide networks,
the trajectory under gradient descent is governed by the Neural Tangent Kernel
(NTK), and for deep networks the NTK itself maintains only weak data
dependence. By analyzing the spectrum of the NTK, we formulate necessary
conditions for trainability and generalization across a range of architectures,
including Fully Connected Networks (FCNs) and Convolutional Neural Networks
(CNNs). We identify large regions of hyperparameter space for which networks
can memorize the training set but completely fail to generalize. We find that
CNNs without global average pooling behave almost identically to FCNs, but that
CNNs with pooling have markedly different and often better generalization
performance. These theoretical results are corroborated experimentally on
CIFAR10 for a variety of network architectures and we include a colab notebook
that reproduces the essential results of the paper.
Related papers
- Deep Neural Networks via Complex Network Theory: a Perspective [3.1023851130450684]
Deep Neural Networks (DNNs) can be represented as graphs whose links and vertices iteratively process data and solve tasks sub-optimally. Complex Network Theory (CNT), merging statistical physics with graph theory, provides a method for interpreting neural networks by analysing their weights and neuron structures.
In this work, we extend the existing CNT metrics with measures that sample from the DNNs' training distribution, shifting from a purely topological analysis to one that connects with the interpretability of deep learning.
arXiv Detail & Related papers (2024-04-17T08:42:42Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Model-Agnostic Reachability Analysis on Deep Neural Networks [25.54542656637704]
We develop a model-agnostic verification framework, called DeepAgn.
It can be applied to FNNs, Recurrent Neural Networks (RNNs), or a mixture of both.
It does not require access to the network's internal structures, such as layers and parameters.
arXiv Detail & Related papers (2023-04-03T09:01:59Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Characterizing Learning Dynamics of Deep Neural Networks via Complex
Networks [1.0869257688521987]
Complex Network Theory (CNT) represents Deep Neural Networks (DNNs) as directed weighted graphs to study them as dynamical systems.
We introduce metrics for nodes/neurons and layers, namely Nodes Strength and Layers Fluctuation.
Our framework distills trends in the learning dynamics and separates low from high accurate networks.
arXiv Detail & Related papers (2021-10-06T10:03:32Z) - Improving Neural Network with Uniform Sparse Connectivity [0.0]
We propose the novel uniform sparse network (USN) with even and sparse connectivity within each layer.
USN consistently and substantially outperforms the state-of-the-art sparse network models in prediction accuracy, speed and robustness.
arXiv Detail & Related papers (2020-11-29T19:00:05Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - On the Empirical Neural Tangent Kernel of Standard Finite-Width
Convolutional Neural Network Architectures [3.4698840925433765]
It remains an open question how well NTK theory models standard neural network architectures of widths common in practice.
We study this question empirically for two well-known convolutional neural network architectures, namely AlexNet and LeNet.
For wider versions of these networks, where the number of channels and widths of fully-connected layers are increased, the deviation decreases.
arXiv Detail & Related papers (2020-06-24T11:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.