The Underlying Correlated Dynamics in Neural Training
- URL: http://arxiv.org/abs/2212.09040v1
- Date: Sun, 18 Dec 2022 08:34:11 GMT
- Title: The Underlying Correlated Dynamics in Neural Training
- Authors: Rotem Turjeman, Tom Berkov, Ido Cohen, Guy Gilboa
- Abstract summary: Training of neural networks is a computationally intensive task.
We propose a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality.
This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.
- Score: 6.385006149689549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training of neural networks is a computationally intensive task. The
significance of understanding and modeling the training dynamics is growing as
increasingly larger networks are being trained. We propose in this work a model
based on the correlation of the parameters' dynamics, which dramatically
reduces the dimensionality. We refer to our algorithm as \emph{correlation mode
decomposition} (CMD). It splits the parameter space into groups of parameters
(modes) which behave in a highly correlated manner through the epochs.
We achieve a remarkable dimensionality reduction with this approach, where
networks like ResNet-18, transformers and GANs, containing millions of
parameters, can be modeled well using just a few modes. We observe each typical
time profile of a mode is spread throughout the network in all layers.
Moreover, our model induces regularization which yields better generalization
capacity on the test set. This representation enhances the understanding of the
underlying training dynamics and can pave the way for designing better
acceleration techniques.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Enhancing Neural Training via a Correlated Dynamics Model [2.9302545029880394]
Correlation Mode Decomposition (CMD) is an algorithm that clusters the parameter space into groups, that display synchronized behavior across epochs.
We introduce an efficient CMD variant, designed to run concurrently with training.
Our experiments indicate that CMD surpasses the state-of-the-art method for compactly modeled dynamics on image classification.
arXiv Detail & Related papers (2023-12-20T18:22:49Z) - Analyzing and Improving the Training Dynamics of Diffusion Models [36.37845647984578]
We identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture.
We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity.
arXiv Detail & Related papers (2023-12-05T11:55:47Z) - Latent State Models of Training Dynamics [51.88132043461152]
We train models with different random seeds and compute a variety of metrics throughout training.
We then fit a hidden Markov model (HMM) over the resulting sequences of metrics.
We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
arXiv Detail & Related papers (2023-08-18T13:20:08Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - Edge of chaos as a guiding principle for modern neural network training [19.419382003562976]
We study the role of various hyperparameters in modern neural network training algorithms in terms of the order-chaos phase diagram.
In particular, we study a fully analytical feedforward neural network trained on the widely adopted Fashion-MNIST dataset.
arXiv Detail & Related papers (2021-07-20T12:17:55Z) - Continuous-in-Depth Neural Networks [107.47887213490134]
We first show that ResNets fail to be meaningful dynamical in this richer sense.
We then demonstrate that neural network models can learn to represent continuous dynamical systems.
We introduce ContinuousNet as a continuous-in-depth generalization of ResNet architectures.
arXiv Detail & Related papers (2020-08-05T22:54:09Z) - Deep learning of contagion dynamics on complex networks [0.0]
We propose a complementary approach based on deep learning to build effective models of contagion dynamics on networks.
By allowing simulations on arbitrary network structures, our approach makes it possible to explore the properties of the learned dynamics beyond the training data.
Our results demonstrate how deep learning offers a new and complementary perspective to build effective models of contagion dynamics on networks.
arXiv Detail & Related papers (2020-06-09T17:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.