Neural Parameter Allocation Search
- URL: http://arxiv.org/abs/2006.10598v4
- Date: Wed, 16 Mar 2022 03:29:34 GMT
- Title: Neural Parameter Allocation Search
- Authors: Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate
Saenko
- Abstract summary: Training neural networks requires increasing amounts of memory.
Existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that fail to generalize.
We introduce Neural Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget.
NPAS covers both low-budget regimes, which produce compact networks, as well as a novel high-budget regime, where additional capacity can be added to boost performance without increasing inference FLOPs.
- Score: 57.190693718951316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training neural networks requires increasing amounts of memory. Parameter
sharing can reduce memory and communication costs, but existing methods assume
networks have many identical layers and utilize hand-crafted sharing strategies
that fail to generalize. We introduce Neural Parameter Allocation Search
(NPAS), a novel task where the goal is to train a neural network given an
arbitrary, fixed parameter budget. NPAS covers both low-budget regimes, which
produce compact networks, as well as a novel high-budget regime, where
additional capacity can be added to boost performance without increasing
inference FLOPs. To address NPAS, we introduce Shapeshifter Networks (SSNs),
which automatically learn where and how to share parameters in a network to
support any parameter budget without requiring any changes to the architecture
or loss function. NPAS and SSNs provide a complete framework for addressing
generalized parameter sharing, and can also be combined with prior work for
additional performance gains. We demonstrate the effectiveness of our approach
using nine network architectures across four diverse tasks, including ImageNet
classification and transformers.
Related papers
- Generalization Guarantees of Gradient Descent for Multi-Layer Neural
Networks [55.86300309474023]
We conduct a comprehensive stability and generalization analysis of gradient descent (GD) for multi-layer NNs.
We derive the excess risk rate of $O(1/sqrtn)$ for GD algorithms in both two-layer and three-layer NNs.
arXiv Detail & Related papers (2023-05-26T12:51:38Z) - Exploring the Complexity of Deep Neural Networks through Functional Equivalence [1.3597551064547502]
We present a novel bound on the covering number for deep neural networks, which reveals that the complexity of neural networks can be reduced.
We demonstrate that functional equivalence benefits optimization, as over parameterized networks tend to be easier to train since increasing network width leads to a diminishing volume of the effective parameter space.
arXiv Detail & Related papers (2023-05-19T04:01:27Z) - Multi-agent Reinforcement Learning with Graph Q-Networks for Antenna
Tuning [60.94661435297309]
The scale of mobile networks makes it challenging to optimize antenna parameters using manual intervention or hand-engineered strategies.
We propose a new multi-agent reinforcement learning algorithm to optimize mobile network configurations globally.
We empirically demonstrate the performance of the algorithm on an antenna tilt tuning problem and a joint tilt and power control problem in a simulated environment.
arXiv Detail & Related papers (2023-01-20T17:06:34Z) - Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources.
The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Kernel Modulation: A Parameter-Efficient Method for Training
Convolutional Neural Networks [19.56633207984127]
This work proposes a novel parameter-efficient kernel modulation (KM) method that adapts all parameters of a base network instead of a subset of layers.
KM uses lightweight task-specialized kernel modulators that require only an additional 1.4% of the base network parameters.
Our results show that KM delivers up to 9% higher accuracy than other parameter-efficient methods on the Transfer Learning benchmark.
arXiv Detail & Related papers (2022-03-29T07:28:50Z) - Network insensitivity to parameter noise via adversarial regularization [0.0]
We present a new adversarial network optimisation algorithm that attacks network parameters during training.
We show that our approach produces models that are more robust to targeted parameter variation.
Our work provides an approach to deploy neural network architectures to inference devices that suffer from computational non-idealities.
arXiv Detail & Related papers (2021-06-09T12:11:55Z) - Self-Reorganizing and Rejuvenating CNNs for Increasing Model Capacity
Utilization [8.661269034961679]
We propose a biologically inspired method for improving the computational resource utilization of neural networks.
The proposed method utilizes the channel activations of a convolution layer in order to reorganize that layers parameters.
The rejuvenated parameters learn different features to supplement those learned by the reorganized surviving parameters.
arXiv Detail & Related papers (2021-02-13T06:19:45Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.