What can linearized neural networks actually say about generalization?
- URL: http://arxiv.org/abs/2106.06770v1
- Date: Sat, 12 Jun 2021 13:05:11 GMT
- Title: What can linearized neural networks actually say about generalization?
- Authors: Guillermo Ortiz-Jim\'enez, Seyed-Mohsen Moosavi-Dezfooli, Pascal
Frossard
- Abstract summary: In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
- Score: 67.83999394554621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For certain infinitely-wide neural networks, the neural tangent kernel (NTK)
theory fully characterizes generalization. However, for the networks used in
practice, the empirical NTK represents only a rough first-order approximation
of these architectures. Still, a growing body of work keeps leveraging this
approximation to successfully analyze important deep learning phenomena and
derive algorithms for new applications. In our work, we provide strong
empirical evidence to determine the practical validity of such approximation by
conducting a systematic comparison of the behaviour of different neural
networks and their linear approximations on different tasks. We show that the
linear approximations can indeed rank the learning complexity of certain tasks
for neural networks, albeit with important nuances. Specifically, we discover
that, in contrast to what was previously observed, neural networks do not
always perform better than their kernel approximations, and reveal that their
performance gap heavily depends on architecture, number of samples and training
task. In fact, we show that during training, deep networks increase the
alignment of their empirical NTK with the target task, which explains why
linear approximations at the end of training can better explain the dynamics of
deep networks. Overall, our work provides concrete examples of novel deep
learning phenomena which can inspire future theoretical research, as well as
provides a new perspective on the use of the NTK approximation in deep
learning.
Related papers
- Operator Learning Meets Numerical Analysis: Improving Neural Networks
through Iterative Methods [2.226971382808806]
We develop a theoretical framework grounded in iterative methods for operator equations.
We demonstrate that popular architectures, such as diffusion models and AlphaFold, inherently employ iterative operator learning.
Our work aims to enhance the understanding of deep learning by merging insights from numerical analysis.
arXiv Detail & Related papers (2023-10-02T20:25:36Z) - When Deep Learning Meets Polyhedral Theory: A Survey [6.899761345257773]
In the past decade, deep became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural learning.
Meanwhile, the structure of neural networks converged back to simplerwise and linear functions.
arXiv Detail & Related papers (2023-04-29T11:46:53Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Limitations of the NTK for Understanding Generalization in Deep Learning [13.44676002603497]
We study NTKs through the lens of scaling laws, and demonstrate that they fall short of explaining important aspects of neural network generalization.
We show that even if the empirical NTK is allowed to be pre-trained on a constant number of samples, the kernel scaling does not catch up to the neural network scaling.
arXiv Detail & Related papers (2022-06-20T21:23:28Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Quasi-orthogonality and intrinsic dimensions as measures of learning and
generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z) - Information Flow in Deep Neural Networks [0.6922389632860545]
There is no comprehensive theoretical understanding of how deep neural networks work or are structured.
Deep networks are often seen as black boxes with unclear interpretations and reliability.
This work aims to apply principles and techniques from information theory to deep learning models to increase our theoretical understanding and design better algorithms.
arXiv Detail & Related papers (2022-02-10T23:32:26Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.