Statistical Mechanics of Deep Linear Neural Networks: The
Back-Propagating Renormalization Group
- URL: http://arxiv.org/abs/2012.04030v1
- Date: Mon, 7 Dec 2020 20:08:31 GMT
- Title: Statistical Mechanics of Deep Linear Neural Networks: The
Back-Propagating Renormalization Group
- Authors: Qianyi Li, Haim Sompolinsky
- Abstract summary: We study the statistical mechanics of learning in Deep Linear Neural Networks (DLNNs) in which the input-output function of an individual unit is linear.
We solve exactly the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space.
Our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks with modest depth.
- Score: 4.56877715768796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of deep learning in many real-world tasks has triggered an effort
to theoretically understand the power and limitations of deep learning in
training and generalization of complex tasks, so far with limited progress. In
this work, we study the statistical mechanics of learning in Deep Linear Neural
Networks (DLNNs) in which the input-output function of an individual unit is
linear. Despite the linearity of the units, learning in DLNNs is highly
nonlinear, hence studying its properties reveals some of the essential features
of nonlinear Deep Neural Networks (DNNs). We solve exactly the network
properties following supervised learning using an equilibrium Gibbs
distribution in the weight space. To do this, we introduce the Back-Propagating
Renormalization Group (BPRG) which allows for the incremental integration of
the network weights layer by layer from the network output layer and
progressing backward. This procedure allows us to evaluate important network
properties such as its generalization error, the role of network width and
depth, the impact of the size of the training set, and the effects of weight
regularization and learning stochasticity. Furthermore, by performing partial
integration of layers, BPRG allows us to compute the emergent properties of the
neural representations across the different hidden layers. We have proposed a
heuristic extension of the BPRG to nonlinear DNNs with rectified linear units
(ReLU). Surprisingly, our numerical simulations reveal that despite the
nonlinearity, the predictions of our theory are largely shared by ReLU networks
with modest depth, in a wide regime of parameters. Our work is the first exact
statistical mechanical study of learning in a family of Deep Neural Networks,
and the first development of the Renormalization Group approach to the weight
space of these systems.
Related papers
- Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks [5.851101657703105]
We take a first step towards theoretically characterizing the conditioning of the Gauss-Newton (GN) matrix in neural networks.
We establish tight bounds on the condition number of the GN in deep linear networks of arbitrary depth and width.
We expand the analysis to further architectural components, such as residual connections and convolutional layers.
arXiv Detail & Related papers (2024-11-04T14:56:48Z) - Low-Rank Learning by Design: the Role of Network Architecture and
Activation Linearity in Gradient Rank Collapse [14.817633094318253]
We study how architectural choices and structure of the data effect gradient rank bounds in deep neural networks (DNNs)
Our theoretical analysis provides these bounds for training fully-connected, recurrent, and convolutional neural networks.
We also demonstrate, both theoretically and empirically, how design choices like activation function linearity, bottleneck layer introduction, convolutional stride, and sequence truncation influence these bounds.
arXiv Detail & Related papers (2024-02-09T19:28:02Z) - Understanding Deep Neural Networks via Linear Separability of Hidden
Layers [68.23950220548417]
We first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets.
We demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance.
arXiv Detail & Related papers (2023-07-26T05:29:29Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Globally Gated Deep Linear Networks [3.04585143845864]
We introduce Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer.
We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit.
Our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width.
arXiv Detail & Related papers (2022-10-31T16:21:56Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Characterizing Learning Dynamics of Deep Neural Networks via Complex
Networks [1.0869257688521987]
Complex Network Theory (CNT) represents Deep Neural Networks (DNNs) as directed weighted graphs to study them as dynamical systems.
We introduce metrics for nodes/neurons and layers, namely Nodes Strength and Layers Fluctuation.
Our framework distills trends in the learning dynamics and separates low from high accurate networks.
arXiv Detail & Related papers (2021-10-06T10:03:32Z) - A Weight Initialization Based on the Linear Product Structure for Neural
Networks [0.0]
We study neural networks from a nonlinear point of view and propose a novel weight initialization strategy that is based on the linear product structure (LPS) of neural networks.
The proposed strategy is derived from the approximation of activation functions by using theories of numerical algebra to guarantee to find all the local minima.
arXiv Detail & Related papers (2021-09-01T00:18:59Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.