Related papers: Recurrent Parameter Generators

Recurrent Parameter Generators

URL: http://arxiv.org/abs/2107.07110v1
Date: Thu, 15 Jul 2021 04:23:59 GMT
Title: Recurrent Parameter Generators
Authors: Jiayun Wang, Yubei Chen, Stella X. Yu, Brian Cheung, Yann LeCun
Abstract summary: We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network. We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models.
Score: 42.159272098922685
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a generic method for recurrently using the same parameters for many different convolution layers to build a deep network. Specifically, for a network, we create a recurrent parameter generator (RPG), from which the parameters of each convolution layer are generated. Though using recurrent models to build a deep convolutional neural network (CNN) is not entirely new, our method achieves significant performance gain compared to the existing works. We demonstrate how to build a one-layer neural network to achieve similar performance compared to other traditional CNN models on various applications and datasets. Such a method allows us to build an arbitrarily complex neural network with any amount of parameters. For example, we build a ResNet34 with model parameters reduced by more than $400$ times, which still achieves $41.6\%$ ImageNet top-1 accuracy. Furthermore, we demonstrate the RPG can be applied at different scales, such as layers, blocks, or even sub-networks. Specifically, we use the RPG to build a ResNet18 network with the number of weights equivalent to one convolutional layer of a conventional ResNet and show this model can achieve $67.2\%$ ImageNet top-1 accuracy. The proposed method can be viewed as an inverse approach to model compression. Rather than removing the unused parameters from a large model, it aims to squeeze more information into a small number of parameters. Extensive experiment results are provided to demonstrate the power of the proposed recurrent parameter generator.

Related papers

Tensor-to-Tensor Models with Fast Iterated Sum Features [3.1806397908898063]
We propose a novel tensor-to-tensor layer with linear cost in the input size.<n>We provide an image-to-image layer that can be plugged into image processing pipelines.
arXiv Detail & Related papers (2025-06-06T12:44:36Z)
Recurrent Diffusion for Large-Scale Parameter Generation [52.98888368644455]
We introduce Recurrent Diffusion for Large Scale Generation (RPG), a novel framework that generates full neural network parameters up to hundreds of millions on a single GPU. RPG serves as a critical advance in AI generating AI, potentially enabling efficient weight generation at scales previously deemed infeasible.
arXiv Detail & Related papers (2025-01-20T16:46:26Z)
Parameter-Efficient Masking Networks [61.43995077575439]
Advanced network designs often contain a large number of repetitive structures (e.g., Transformer) In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning masks. It leads to a new paradigm for model compression to diminish the model size.
arXiv Detail & Related papers (2022-10-13T03:39:03Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks [1.0814638303152528]
Deep neural networks (DNNs) are so over-parametrized that recent research has found them to contain a subnetwork with high accuracy. This paper proposes blending these lines of research into a highly compressed yet accurate model: Hidden-Fold Networks (HFNs) It achieves equivalent performance to ResNet50 on CIFAR100 while occupying 38.5x less memory, and similar performance to ResNet34 on ImageNet with a memory size 26.8x smaller.
arXiv Detail & Related papers (2021-11-24T08:24:31Z)
Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks. The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)
NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks [16.518667634574026]
We search for the neuron (filter) configuration of a fixed network architecture that maximizes accuracy. We parameterize the change of the neuron (filter) number of each layer with respect to the change in parameters, allowing us to efficiently scale an architecture across arbitrary sizes.
arXiv Detail & Related papers (2020-06-23T08:14:02Z)
Multigrid-in-Channels Architectures for Wide Convolutional Neural Networks [6.929025509877642]
We present a multigrid approach that combats the quadratic growth of the number of parameters with respect to the number of channels in standard convolutional neural networks (CNNs) Our examples from supervised image classification show that applying this strategy to residual networks and MobileNetV2 considerably reduces the number of parameters without negatively affecting accuracy.
arXiv Detail & Related papers (2020-06-11T20:28:36Z)
Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture. We show consistent improvements in accuracy and learning convergence over the baseline. Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.