The activity-weight duality in feed forward neural networks: The
geometric determinants of generalization
- URL: http://arxiv.org/abs/2203.10736v3
- Date: Fri, 21 Jul 2023 03:39:05 GMT
- Title: The activity-weight duality in feed forward neural networks: The
geometric determinants of generalization
- Authors: Yu Feng and Yuhai Tu
- Abstract summary: We find an exact duality between changes in activities in a given layer of neurons and changes in weights that connect to the next layer of neurons in a densely connected layer in any feed forward neural network.
These insights can be used to guide development of algorithms for finding more generalizable solutions in overparametrized neural networks.
- Score: 7.372592187197655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the fundamental problems in machine learning is generalization. In
neural network models with a large number of weights (parameters), many
solutions can be found to fit the training data equally well. The key question
is which solution can describe testing data not in the training set. Here, we
report the discovery of an exact duality (equivalence) between changes in
activities in a given layer of neurons and changes in weights that connect to
the next layer of neurons in a densely connected layer in any feed forward
neural network. The activity-weight (A-W) duality allows us to map variations
in inputs (data) to variations of the corresponding dual weights. By using this
mapping, we show that the generalization loss can be decomposed into a sum of
contributions from different eigen-directions of the Hessian matrix of the loss
function at the solution in weight space. The contribution from a given
eigen-direction is the product of two geometric factors (determinants): the
sharpness of the loss landscape and the standard deviation of the dual weights,
which is found to scale with the weight norm of the solution. Our results
provide an unified framework, which we used to reveal how different
regularization schemes (weight decay, stochastic gradient descent with
different batch sizes and learning rates, dropout), training data size, and
labeling noise affect generalization performance by controlling either one or
both of these two geometric determinants for generalization. These insights can
be used to guide development of algorithms for finding more generalizable
solutions in overparametrized neural networks.
Related papers
- Network scaling and scale-driven loss balancing for intelligent poroelastography [2.665036498336221]
A deep learning framework is developed for multiscale characterization of poroelastic media from full waveform data.
Two major challenges impede direct application of existing state-of-the-art techniques for this purpose.
We propose the idea of emphnetwork scaling where the neural property maps are constructed by unit shape functions composed into a scaling layer.
arXiv Detail & Related papers (2024-10-27T23:06:29Z) - Hessian Eigenvectors and Principal Component Analysis of Neural Network
Weight Matrices [0.0]
This study delves into the intricate dynamics of trained deep neural networks and their relationships with network parameters.
We unveil a correlation between Hessian eigenvectors and network weights.
This relationship, hinging on the magnitude of eigenvalues, allows us to discern parameter directions within the network.
arXiv Detail & Related papers (2023-11-01T11:38:31Z) - Expand-and-Cluster: Parameter Recovery of Neural Networks [9.497862562614666]
We show that the incoming weight vector of each neuron is identifiable up to sign or scaling, depending on the activation function.
Our novel method 'Expand-and-Cluster' can identify weights of a target network for all commonly used activation functions.
arXiv Detail & Related papers (2023-04-25T13:14:20Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Conditional physics informed neural networks [85.48030573849712]
We introduce conditional PINNs (physics informed neural networks) for estimating the solution of classes of eigenvalue problems.
We show that a single deep neural network can learn the solution of partial differential equations for an entire class of problems.
arXiv Detail & Related papers (2021-04-06T18:29:14Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Properties of the geometry of solutions and capacity of multi-layer neural networks with Rectified Linear Units activations [2.3018169548556977]
We study the effects of Rectified Linear Units on the capacity and on the geometrical landscape of the solution space in two-layer neural networks.
We find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases.
Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure.
arXiv Detail & Related papers (2019-07-17T15:23:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.