Empirical Phase Diagram for Three-layer Neural Networks with Infinite
Width
- URL: http://arxiv.org/abs/2205.12101v1
- Date: Tue, 24 May 2022 14:27:31 GMT
- Title: Empirical Phase Diagram for Three-layer Neural Networks with Infinite
Width
- Authors: Hanxu Zhou, Qixuan Zhou, Zhenyuan Jin, Tao Luo, Yaoyu Zhang, Zhi-Qin
John Xu
- Abstract summary: We make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width.
For both synthetic datasets and real datasets, we find that the dynamics of each layer could be divided into a linear regime and a condensed regime.
In the condensed regime, we also observe the condensation of weights in isolated orientations with low complexity.
- Score: 5.206156813130247
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Substantial work indicates that the dynamics of neural networks (NNs) is
closely related to their initialization of parameters. Inspired by the phase
diagram for two-layer ReLU NNs with infinite width (Luo et al., 2021), we make
a step towards drawing a phase diagram for three-layer ReLU NNs with infinite
width. First, we derive a normalized gradient flow for three-layer ReLU NNs and
obtain two key independent quantities to distinguish different dynamical
regimes for common initialization methods. With carefully designed experiments
and a large computation cost, for both synthetic datasets and real datasets, we
find that the dynamics of each layer also could be divided into a linear regime
and a condensed regime, separated by a critical regime. The criteria is the
relative change of input weights (the input weight of a hidden neuron consists
of the weight from its input layer to the hidden neuron and its bias term) as
the width approaches infinity during the training, which tends to $0$,
$+\infty$ and $O(1)$, respectively. In addition, we also demonstrate that
different layers can lie in different dynamical regimes in a training process
within a deep NN. In the condensed regime, we also observe the condensation of
weights in isolated orientations with low complexity. Through experiments under
three-layer condition, our phase diagram suggests a complicated dynamical
regimes consisting of three possible regimes, together with their mixture, for
deep NNs and provides a guidance for studying deep NNs in different
initialization regimes, which reveals the possibility of completely different
dynamics emerging within a deep NN for its different layers.
Related papers
- Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Systematic construction of continuous-time neural networks for linear dynamical systems [0.0]
We discuss a systematic approach to constructing neural architectures for modeling a subclass of dynamical systems.
We use a variant of continuous-time neural networks in which the output of each neuron evolves continuously as a solution of a first-order or second-order Ordinary Differential Equation (ODE)
Instead of deriving the network architecture and parameters from data, we propose a gradient-free algorithm to compute sparse architecture and network parameters directly from the given LTI system.
arXiv Detail & Related papers (2024-03-24T16:16:41Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - Training Integrable Parameterizations of Deep Neural Networks in the
Infinite-Width Limit [0.0]
Large-width dynamics has emerged as a fruitful viewpoint and led to practical insights on real-world deep networks.
For two-layer neural networks, it has been understood that the nature of the trained model radically changes depending on the scale of the initial random weights.
We propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics.
arXiv Detail & Related papers (2021-10-29T07:53:35Z) - Hybrid-Layers Neural Network Architectures for Modeling the
Self-Interference in Full-Duplex Systems [23.55330151898652]
Full analysis (FD) systems provide simultaneous transmission of information over frequency resources.
This article proposes two novel hybrid-layers neural network (NN) architectures to cancel localized with low complexity.
The proposed NNs exploit, in a novel manner, a combination of hidden layers (e.g., dense) in order to model the SI with lower computational complexity than the state-of-the-art NN-based cancelers.
arXiv Detail & Related papers (2021-10-18T14:18:56Z) - Phase diagram for two-layer ReLU neural networks at infinite-width limit [6.380166265263755]
We draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit.
We identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime.
In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay.
In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations.
arXiv Detail & Related papers (2020-07-15T06:04:35Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.