Related papers: The activity-weight duality in feed forward neural networks: The geometric determinants of generalization

The activity-weight duality in feed forward neural networks: The geometric determinants of generalization

URL: http://arxiv.org/abs/2203.10736v3
Date: Fri, 21 Jul 2023 03:39:05 GMT
Title: The activity-weight duality in feed forward neural networks: The geometric determinants of generalization
Authors: Yu Feng and Yuhai Tu
Abstract summary: We find an exact duality between changes in activities in a given layer of neurons and changes in weights that connect to the next layer of neurons in a densely connected layer in any feed forward neural network. These insights can be used to guide development of algorithms for finding more generalizable solutions in overparametrized neural networks.
Score: 7.372592187197655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One of the fundamental problems in machine learning is generalization. In neural network models with a large number of weights (parameters), many solutions can be found to fit the training data equally well. The key question is which solution can describe testing data not in the training set. Here, we report the discovery of an exact duality (equivalence) between changes in activities in a given layer of neurons and changes in weights that connect to the next layer of neurons in a densely connected layer in any feed forward neural network. The activity-weight (A-W) duality allows us to map variations in inputs (data) to variations of the corresponding dual weights. By using this mapping, we show that the generalization loss can be decomposed into a sum of contributions from different eigen-directions of the Hessian matrix of the loss function at the solution in weight space. The contribution from a given eigen-direction is the product of two geometric factors (determinants): the sharpness of the loss landscape and the standard deviation of the dual weights, which is found to scale with the weight norm of the solution. Our results provide an unified framework, which we used to reveal how different regularization schemes (weight decay, stochastic gradient descent with different batch sizes and learning rates, dropout), training data size, and labeling noise affect generalization performance by controlling either one or both of these two geometric determinants for generalization. These insights can be used to guide development of algorithms for finding more generalizable solutions in overparametrized neural networks.

Related papers

Are Two Hidden Layers Still Enough for the Physics-Informed Neural Networks? [0.0]
The article discusses the development of various methods and techniques for initializing and training neural networks with a single hidden layer. The proposed methods have been extended to 2D problems using the separable physics-informed neural networks approach.
arXiv Detail & Related papers (2024-12-26T14:30:54Z)
Network scaling and scale-driven loss balancing for intelligent poroelastography [2.665036498336221]
A deep learning framework is developed for multiscale characterization of poroelastic media from full waveform data. Two major challenges impede direct application of existing state-of-the-art techniques for this purpose. We propose the idea of emphnetwork scaling where the neural property maps are constructed by unit shape functions composed into a scaling layer.
arXiv Detail & Related papers (2024-10-27T23:06:29Z)
Hessian Eigenvectors and Principal Component Analysis of Neural Network Weight Matrices [0.0]
This study delves into the intricate dynamics of trained deep neural networks and their relationships with network parameters. We unveil a correlation between Hessian eigenvectors and network weights. This relationship, hinging on the magnitude of eigenvalues, allows us to discern parameter directions within the network.
arXiv Detail & Related papers (2023-11-01T11:38:31Z)
Expand-and-Cluster: Parameter Recovery of Neural Networks [9.497862562614666]
We show that the incoming weight vector of each neuron is identifiable up to sign or scaling, depending on the activation function. Our novel method 'Expand-and-Cluster' can identify weights of a target network for all commonly used activation functions.
arXiv Detail & Related papers (2023-04-25T13:14:20Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Conditional physics informed neural networks [85.48030573849712]
We introduce conditional PINNs (physics informed neural networks) for estimating the solution of classes of eigenvalue problems. We show that a single deep neural network can learn the solution of partial differential equations for an entire class of problems.
arXiv Detail & Related papers (2021-04-06T18:29:14Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Properties of the geometry of solutions and capacity of multi-layer neural networks with Rectified Linear Units activations [2.3018169548556977]
We study the effects of Rectified Linear Units on the capacity and on the geometrical landscape of the solution space in two-layer neural networks. We find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure.
arXiv Detail & Related papers (2019-07-17T15:23:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.