Recurrent Parameter Generators
- URL: http://arxiv.org/abs/2107.07110v1
- Date: Thu, 15 Jul 2021 04:23:59 GMT
- Title: Recurrent Parameter Generators
- Authors: Jiayun Wang, Yubei Chen, Stella X. Yu, Brian Cheung, Yann LeCun
- Abstract summary: We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network.
We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models.
- Score: 42.159272098922685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a generic method for recurrently using the same parameters for
many different convolution layers to build a deep network. Specifically, for a
network, we create a recurrent parameter generator (RPG), from which the
parameters of each convolution layer are generated. Though using recurrent
models to build a deep convolutional neural network (CNN) is not entirely new,
our method achieves significant performance gain compared to the existing
works. We demonstrate how to build a one-layer neural network to achieve
similar performance compared to other traditional CNN models on various
applications and datasets. Such a method allows us to build an arbitrarily
complex neural network with any amount of parameters. For example, we build a
ResNet34 with model parameters reduced by more than $400$ times, which still
achieves $41.6\%$ ImageNet top-1 accuracy. Furthermore, we demonstrate the RPG
can be applied at different scales, such as layers, blocks, or even
sub-networks. Specifically, we use the RPG to build a ResNet18 network with the
number of weights equivalent to one convolutional layer of a conventional
ResNet and show this model can achieve $67.2\%$ ImageNet top-1 accuracy. The
proposed method can be viewed as an inverse approach to model compression.
Rather than removing the unused parameters from a large model, it aims to
squeeze more information into a small number of parameters. Extensive
experiment results are provided to demonstrate the power of the proposed
recurrent parameter generator.
Related papers
- Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks [1.0814638303152528]
Deep neural networks (DNNs) are so over-parametrized that recent research has found them to contain a subnetwork with high accuracy.
This paper proposes blending these lines of research into a highly compressed yet accurate model: Hidden-Fold Networks (HFNs)
It achieves equivalent performance to ResNet50 on CIFAR100 while occupying 38.5x less memory, and similar performance to ResNet34 on ImageNet with a memory size 26.8x smaller.
arXiv Detail & Related papers (2021-11-24T08:24:31Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep
Neural Networks [16.518667634574026]
We search for the neuron (filter) configuration of a fixed network architecture that maximizes accuracy.
We parameterize the change of the neuron (filter) number of each layer with respect to the change in parameters, allowing us to efficiently scale an architecture across arbitrary sizes.
arXiv Detail & Related papers (2020-06-23T08:14:02Z) - Multigrid-in-Channels Architectures for Wide Convolutional Neural
Networks [6.929025509877642]
We present a multigrid approach that combats the quadratic growth of the number of parameters with respect to the number of channels in standard convolutional neural networks (CNNs)
Our examples from supervised image classification show that applying this strategy to residual networks and MobileNetV2 considerably reduces the number of parameters without negatively affecting accuracy.
arXiv Detail & Related papers (2020-06-11T20:28:36Z) - Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture.
We show consistent improvements in accuracy and learning convergence over the baseline.
Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.