GradMax: Growing Neural Networks using Gradient Information
- URL: http://arxiv.org/abs/2201.05125v1
- Date: Thu, 13 Jan 2022 18:30:18 GMT
- Title: GradMax: Growing Neural Networks using Gradient Information
- Authors: Utku Evci, Max Vladymyrov, Thomas Unterthiner, Bart van Merri\"enboer,
Fabian Pedregosa
- Abstract summary: We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics.
We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.
- Score: 22.986063120002353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The architecture and the parameters of neural networks are often optimized
independently, which requires costly retraining of the parameters whenever the
architecture is modified. In this work we instead focus on growing the
architecture without requiring costly retraining. We present a method that adds
new neurons during training without impacting what is already learned, while
improving the training dynamics. We achieve the latter by maximizing the
gradients of the new weights and find the optimal initialization efficiently by
means of the singular value decomposition (SVD). We call this technique
Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in
variety of vision tasks and architectures.
Related papers
- Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally [2.645067871482715]
In machine learning tasks, one searches for an optimal function within a certain functional space.
This way forces the evolution of the function during training to lie within the realm of what is expressible with the chosen architecture.
We show that the information about desirable architectural changes, due to expressivity bottlenecks can be extracted from %the backpropagation.
arXiv Detail & Related papers (2024-05-30T08:23:56Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Composable Function-preserving Expansions for Transformer Architectures [2.579908688646812]
Training state-of-the-art neural networks requires a high cost in terms of compute and time.
We propose six composable transformations to incrementally increase the size of transformer-based neural networks.
arXiv Detail & Related papers (2023-08-11T12:27:22Z) - Multiplicative update rules for accelerating deep learning training and
increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules.
We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z) - Neuroevolution of Recurrent Architectures on Control Tasks [3.04585143845864]
We implement a massively parallel evolutionary algorithm and run experiments on all 19 OpenAI Gym state-based reinforcement learning control tasks.
We find that dynamic agents match or exceed the performance of gradient-based agents while utilizing orders of magnitude fewer parameters.
arXiv Detail & Related papers (2023-04-03T16:29:18Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Dynamically Grown Generative Adversarial Networks [111.43128389995341]
We propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation.
The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator.
arXiv Detail & Related papers (2021-06-16T01:25:51Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.