Related papers: Neural Parameter Allocation Search

Neural Parameter Allocation Search

URL: http://arxiv.org/abs/2006.10598v4
Date: Wed, 16 Mar 2022 03:29:34 GMT
Title: Neural Parameter Allocation Search
Authors: Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko
Abstract summary: Training neural networks requires increasing amounts of memory. Existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that fail to generalize. We introduce Neural Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget. NPAS covers both low-budget regimes, which produce compact networks, as well as a novel high-budget regime, where additional capacity can be added to boost performance without increasing inference FLOPs.
Score: 57.190693718951316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training neural networks requires increasing amounts of memory. Parameter sharing can reduce memory and communication costs, but existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that fail to generalize. We introduce Neural Parameter Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget. NPAS covers both low-budget regimes, which produce compact networks, as well as a novel high-budget regime, where additional capacity can be added to boost performance without increasing inference FLOPs. To address NPAS, we introduce Shapeshifter Networks (SSNs), which automatically learn where and how to share parameters in a network to support any parameter budget without requiring any changes to the architecture or loss function. NPAS and SSNs provide a complete framework for addressing generalized parameter sharing, and can also be combined with prior work for additional performance gains. We demonstrate the effectiveness of our approach using nine network architectures across four diverse tasks, including ImageNet classification and transformers.

Related papers

Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks [55.86300309474023]
We conduct a comprehensive stability and generalization analysis of gradient descent (GD) for multi-layer NNs. We derive the excess risk rate of $O(1/sqrtn)$ for GD algorithms in both two-layer and three-layer NNs.
arXiv Detail & Related papers (2023-05-26T12:51:38Z)
Exploring the Complexity of Deep Neural Networks through Functional Equivalence [1.3597551064547502]
We present a novel bound on the covering number for deep neural networks, which reveals that the complexity of neural networks can be reduced. We demonstrate that functional equivalence benefits optimization, as over parameterized networks tend to be easier to train since increasing network width leads to a diminishing volume of the effective parameter space.
arXiv Detail & Related papers (2023-05-19T04:01:27Z)
Multi-agent Reinforcement Learning with Graph Q-Networks for Antenna Tuning [60.94661435297309]
The scale of mobile networks makes it challenging to optimize antenna parameters using manual intervention or hand-engineered strategies. We propose a new multi-agent reinforcement learning algorithm to optimize mobile network configurations globally. We empirically demonstrate the performance of the algorithm on an antenna tilt tuning problem and a joint tilt and power control problem in a simulated environment.
arXiv Detail & Related papers (2023-01-20T17:06:34Z)
Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources. The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Kernel Modulation: A Parameter-Efficient Method for Training Convolutional Neural Networks [19.56633207984127]
This work proposes a novel parameter-efficient kernel modulation (KM) method that adapts all parameters of a base network instead of a subset of layers. KM uses lightweight task-specialized kernel modulators that require only an additional 1.4% of the base network parameters. Our results show that KM delivers up to 9% higher accuracy than other parameter-efficient methods on the Transfer Learning benchmark.
arXiv Detail & Related papers (2022-03-29T07:28:50Z)
Network insensitivity to parameter noise via adversarial regularization [0.0]
We present a new adversarial network optimisation algorithm that attacks network parameters during training. We show that our approach produces models that are more robust to targeted parameter variation. Our work provides an approach to deploy neural network architectures to inference devices that suffer from computational non-idealities.
arXiv Detail & Related papers (2021-06-09T12:11:55Z)
Self-Reorganizing and Rejuvenating CNNs for Increasing Model Capacity Utilization [8.661269034961679]
We propose a biologically inspired method for improving the computational resource utilization of neural networks. The proposed method utilizes the channel activations of a convolution layer in order to reorganize that layers parameters. The rejuvenated parameters learn different features to supplement those learned by the reorganized surviving parameters.
arXiv Detail & Related papers (2021-02-13T06:19:45Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.