Recurrent Parameter Generators
- URL: http://arxiv.org/abs/2107.07110v1
- Date: Thu, 15 Jul 2021 04:23:59 GMT
- Title: Recurrent Parameter Generators
- Authors: Jiayun Wang, Yubei Chen, Stella X. Yu, Brian Cheung, Yann LeCun
- Abstract summary: We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network.
We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models.
- Score: 42.159272098922685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a generic method for recurrently using the same parameters for
many different convolution layers to build a deep network. Specifically, for a
network, we create a recurrent parameter generator (RPG), from which the
parameters of each convolution layer are generated. Though using recurrent
models to build a deep convolutional neural network (CNN) is not entirely new,
our method achieves significant performance gain compared to the existing
works. We demonstrate how to build a one-layer neural network to achieve
similar performance compared to other traditional CNN models on various
applications and datasets. Such a method allows us to build an arbitrarily
complex neural network with any amount of parameters. For example, we build a
ResNet34 with model parameters reduced by more than $400$ times, which still
achieves $41.6\%$ ImageNet top-1 accuracy. Furthermore, we demonstrate the RPG
can be applied at different scales, such as layers, blocks, or even
sub-networks. Specifically, we use the RPG to build a ResNet18 network with the
number of weights equivalent to one convolutional layer of a conventional
ResNet and show this model can achieve $67.2\%$ ImageNet top-1 accuracy. The
proposed method can be viewed as an inverse approach to model compression.
Rather than removing the unused parameters from a large model, it aims to
squeeze more information into a small number of parameters. Extensive
experiment results are provided to demonstrate the power of the proposed
recurrent parameter generator.
Related papers
- Recurrent Diffusion for Large-Scale Parameter Generation [52.98888368644455]
We introduce Recurrent Diffusion for Large Scale Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU.
RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.
arXiv Detail & Related papers (2025-01-20T16:46:26Z) - Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer)
In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks.
It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep
Neural Networks [16.518667634574026]
We search for the neuron (filter) configuration of a fixed network architecture that maximizes accuracy.
We parameterize the change of the neuron (filter) number of each layer with respect to the change in parameters, allowing us to efficiently scale an architecture across arbitrary sizes.
arXiv Detail & Related papers (2020-06-23T08:14:02Z) - Multigrid-in-Channels Architectures for Wide Convolutional Neural
Networks [6.929025509877642]
We present a multigrid approach that combats the quadratic growth of the number of parameters with respect to the number of channels in standard convolutional neural networks (CNNs)
Our examples from supervised image classification show that applying this strategy to residual networks and MobileNetV2 considerably reduces the number of parameters without negatively affecting accuracy.
arXiv Detail & Related papers (2020-06-11T20:28:36Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.