Edge of chaos as a guiding principle for modern neural network training
- URL: http://arxiv.org/abs/2107.09437v1
- Date: Tue, 20 Jul 2021 12:17:55 GMT
- Title: Edge of chaos as a guiding principle for modern neural network training
- Authors: Lin Zhang, Ling Feng, Kan Chen and Choy Heng Lai
- Abstract summary: We study the role of various hyperparameters in modern neural network training algorithms in terms of the order-chaos phase diagram.
In particular, we study a fully analytical feedforward neural network trained on the widely adopted Fashion-MNIST dataset.
- Score: 19.419382003562976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of deep neural networks in real-world problems has prompted many
attempts to explain their training dynamics and generalization performance, but
more guiding principles for the training of neural networks are still needed.
Motivated by the edge of chaos principle behind the optimal performance of
neural networks, we study the role of various hyperparameters in modern neural
network training algorithms in terms of the order-chaos phase diagram. In
particular, we study a fully analytical feedforward neural network trained on
the widely adopted Fashion-MNIST dataset, and study the dynamics associated
with the hyperparameters in back-propagation during the training process. We
find that for the basic algorithm of stochastic gradient descent with momentum,
in the range around the commonly used hyperparameter values, clear scaling
relations are present with respect to the training time during the ordered
phase in the phase diagram, and the model's optimal generalization power at the
edge of chaos is similar across different training parameter combinations. In
the chaotic phase, the same scaling no longer exists. The scaling allows us to
choose the training parameters to achieve faster training without sacrificing
performance. In addition, we find that the commonly used model regularization
method - weight decay - effectively pushes the model towards the ordered phase
to achieve better performance. Leveraging on this fact and the scaling
relations in the other hyperparameters, we derived a principled guideline for
hyperparameter determination, such that the model can achieve optimal
performance by saturating it at the edge of chaos. Demonstrated on this simple
neural network model and training algorithm, our work improves the
understanding of neural network training dynamics, and can potentially be
extended to guiding principles of more complex model architectures and
algorithms.
Related papers
- Peer-to-Peer Learning Dynamics of Wide Neural Networks [10.179711440042123]
We provide an explicit, non-asymptotic characterization of the learning dynamics of wide neural networks trained using popularDGD algorithms.
We validate our analytical results by accurately predicting error and error and for classification tasks.
arXiv Detail & Related papers (2024-09-23T17:57:58Z) - Adaptive Class Emergence Training: Enhancing Neural Network Stability and Generalization through Progressive Target Evolution [0.0]
We propose a novel training methodology for neural networks in classification problems.
We evolve the target outputs from a null vector to one-hot encoded vectors throughout the training process.
This gradual transition allows the network to adapt more smoothly to the increasing complexity of the classification task.
arXiv Detail & Related papers (2024-09-04T03:25:48Z) - Dynamical stability and chaos in artificial neural network trajectories along training [3.379574469735166]
We study the dynamical properties of this process by analyzing through this lens the network trajectories of a shallow neural network.
We find hints of regular and chaotic behavior depending on the learning rate regime.
This work also contributes to the cross-fertilization of ideas between dynamical systems theory, network theory and machine learning.
arXiv Detail & Related papers (2024-04-08T17:33:11Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - NeuralFastLAS: Fast Logic-Based Learning from Raw Data [54.938128496934695]
Symbolic rule learners generate interpretable solutions, however they require the input to be encoded symbolically.
Neuro-symbolic approaches overcome this issue by mapping raw data to latent symbolic concepts using a neural network.
We introduce NeuralFastLAS, a scalable and fast end-to-end approach that trains a neural network jointly with a symbolic learner.
arXiv Detail & Related papers (2023-10-08T12:33:42Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Identifying Equivalent Training Dynamics [3.793387630509845]
We develop a framework for identifying conjugate and non-conjugate training dynamics.
By leveraging advances in Koopman operator theory, we demonstrate that comparing Koopman eigenvalues can correctly identify a known equivalence between online mirror descent and online gradient descent.
We then utilize our approach to: (a) identify non-conjugate training dynamics between shallow and wide fully connected neural networks; (b) characterize the early phase of training dynamics in convolutional neural networks; (c) uncover non-conjugate training dynamics in Transformers that do and do not undergo grokking.
arXiv Detail & Related papers (2023-02-17T22:15:20Z) - The Underlying Correlated Dynamics in Neural Training [6.385006149689549]
Training of neural networks is a computationally intensive task.
We propose a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality.
This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.
arXiv Detail & Related papers (2022-12-18T08:34:11Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.