MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters
- URL: http://arxiv.org/abs/2311.04251v1
- Date: Tue, 7 Nov 2023 11:37:08 GMT
- Title: MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters
- Authors: Chau Pham, Piotr Teterwak, Soren Nelson, Bryan A. Plummer
- Abstract summary: Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes.
To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size.
This naive approach falls short in practice as it brings too much noise to the growing process.
- Score: 19.358670728803336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most deep neural networks are trained under fixed network architectures and
require retraining when the architecture changes. If expanding the network's
size is needed, it is necessary to retrain from scratch, which is expensive. To
avoid this, one can grow from a small network by adding random weights over
time to gradually achieve the target network size. However, this naive approach
falls short in practice as it brings too much noise to the growing process.
Prior work tackled this issue by leveraging the already learned weights and
training data for generating new weights through conducting a computationally
expensive analysis step. In this paper, we introduce MixtureGrowth, a new
approach to growing networks that circumvents the initialization overhead in
prior work. Before growing, each layer in our model is generated with a linear
combination of parameter templates. Newly grown layer weights are generated by
using a new linear combination of existing templates for a layer. On one hand,
these templates are already trained for the task, providing a strong
initialization. On the other, the new coefficients provide flexibility for the
added layer weights to learn something new. We show that our approach boosts
top-1 accuracy over the state-of-the-art by 2-2.5% on CIFAR-100 and ImageNet
datasets, while achieving comparable performance with fewer FLOPs to a larger
network trained from scratch. Code is available at
https://github.com/chaudatascience/mixturegrowth.
Related papers
- Neural Metamorphosis [72.88137795439407]
This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks.
NeuMeta directly learns the continuous weight manifold of neural networks.
It sustains full-size performance even at a 75% compression rate.
arXiv Detail & Related papers (2024-10-10T14:49:58Z) - Data Augmentations in Deep Weight Spaces [89.45272760013928]
We introduce a novel augmentation scheme based on the Mixup method.
We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate.
arXiv Detail & Related papers (2023-11-15T10:43:13Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - GROWN: GRow Only When Necessary for Continual Learning [39.56829374809613]
Catastrophic forgetting is a notorious issue in deep learning, referring to the fact that Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks.
To address this issue, continual learning has been developed to learn new tasks sequentially and perform knowledge transfer from the old tasks to the new ones without forgetting.
GROWN is a novel end-to-end continual learning framework to dynamically grow the model only when necessary.
arXiv Detail & Related papers (2021-10-03T02:31:04Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - HALO: Learning to Prune Neural Networks with Shrinkage [5.283963846188862]
Deep neural networks achieve state-of-the-art performance in a variety of tasks by extracting a rich set of features from unstructured data.
Modern techniques for inducing sparsity and reducing model size are (1) network pruning, (2) training with a sparsity inducing penalty, and (3) training a binary mask jointly with the weights of the network.
We present a novel penalty called Hierarchical Adaptive Lasso which learns to adaptively sparsify weights of a given network via trainable parameters.
arXiv Detail & Related papers (2020-08-24T04:08:48Z) - Training highly effective connectivities within neural networks with
randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights.
We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.