Related papers: The Underlying Correlated Dynamics in Neural Training

The Underlying Correlated Dynamics in Neural Training

URL: http://arxiv.org/abs/2212.09040v1
Date: Sun, 18 Dec 2022 08:34:11 GMT
Title: The Underlying Correlated Dynamics in Neural Training
Authors: Rotem Turjeman, Tom Berkov, Ido Cohen, Guy Gilboa
Abstract summary: Training of neural networks is a computationally intensive task. We propose a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality. This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.
Score: 6.385006149689549
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training of neural networks is a computationally intensive task. The significance of understanding and modeling the training dynamics is growing as increasingly larger networks are being trained. We propose in this work a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality. We refer to our algorithm as \emph{correlation mode decomposition} (CMD). It splits the parameter space into groups of parameters (modes) which behave in a highly correlated manner through the epochs. We achieve a remarkable dimensionality reduction with this approach, where networks like ResNet-18, transformers and GANs, containing millions of parameters, can be modeled well using just a few modes. We observe each typical time profile of a mode is spread throughout the network in all layers. Moreover, our model induces regularization which yields better generalization capacity on the test set. This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.

Related papers

KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks [9.512147747894026]
We show how a gradient flow can be decomposed into a product that involves two operators.<n>We show how their interplay gives rise to low-dimensional latent dynamics under GD.<n>For multi-task training, we show that the operators can be used to measure how objectives relevant to individual sub-tasks align.
arXiv Detail & Related papers (2025-07-08T20:33:15Z)
Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer [40.40780546513363]
We provide descriptions of both non-residual and residual neural networks, the latter of which enables an infinite depth limit when branches are scaled as $1/sqrttextdepth$. We show that this model recovers the accelerated power law training dynamics for power law structured data in the rich regime observed in recent works.
arXiv Detail & Related papers (2025-02-04T17:50:55Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Enhancing Neural Training via a Correlated Dynamics Model [2.9302545029880394]
Correlation Mode Decomposition (CMD) is an algorithm that clusters the parameter space into groups, that display synchronized behavior across epochs. We introduce an efficient CMD variant, designed to run concurrently with training. Our experiments indicate that CMD surpasses the state-of-the-art method for compactly modeled dynamics on image classification.
arXiv Detail & Related papers (2023-12-20T18:22:49Z)
Analyzing and Improving the Training Dynamics of Diffusion Models [36.37845647984578]
We identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity.
arXiv Detail & Related papers (2023-12-05T11:55:47Z)
Latent State Models of Training Dynamics [51.88132043461152]
We train models with different random seeds and compute a variety of metrics throughout training. We then fit a hidden Markov model (HMM) over the resulting sequences of metrics. We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
arXiv Detail & Related papers (2023-08-18T13:20:08Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training. We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z)
Edge of chaos as a guiding principle for modern neural network training [19.419382003562976]
We study the role of various hyperparameters in modern neural network training algorithms in terms of the order-chaos phase diagram. In particular, we study a fully analytical feedforward neural network trained on the widely adopted Fashion-MNIST dataset.
arXiv Detail & Related papers (2021-07-20T12:17:55Z)
Continuous-in-Depth Neural Networks [107.47887213490134]
We first show that ResNets fail to be meaningful dynamical in this richer sense. We then demonstrate that neural network models can learn to represent continuous dynamical systems. We introduce ContinuousNet as a continuous-in-depth generalization of ResNet architectures.
arXiv Detail & Related papers (2020-08-05T22:54:09Z)
Deep learning of contagion dynamics on complex networks [0.0]
We propose a complementary approach based on deep learning to build effective models of contagion dynamics on networks. By allowing simulations on arbitrary network structures, our approach makes it possible to explore the properties of the learned dynamics beyond the training data. Our results demonstrate how deep learning offers a new and complementary perspective to build effective models of contagion dynamics on networks.
arXiv Detail & Related papers (2020-06-09T17:18:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.