Parameter Prediction for Unseen Deep Architectures
- URL: http://arxiv.org/abs/2110.13100v1
- Date: Mon, 25 Oct 2021 16:52:33 GMT
- Title: Parameter Prediction for Unseen Deep Architectures
- Authors: Boris Knyazev, Michal Drozdzal, Graham W. Taylor, Adriana
Romero-Soriano
- Abstract summary: We study if we can use deep learning to directly predict parameters by exploiting the past knowledge of training other networks.
We propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU.
The proposed model achieves surprisingly good performance on unseen and diverse networks.
- Score: 23.79630072083828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has been successful in automating the design of features in
machine learning pipelines. However, the algorithms optimizing neural network
parameters remain largely hand-designed and computationally inefficient. We
study if we can use deep learning to directly predict these parameters by
exploiting the past knowledge of training other networks. We introduce a
large-scale dataset of diverse computational graphs of neural architectures -
DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and
ImageNet. By leveraging advances in graph neural networks, we propose a
hypernetwork that can predict performant parameters in a single forward pass
taking a fraction of a second, even on a CPU. The proposed model achieves
surprisingly good performance on unseen and diverse networks. For example, it
is able to predict all 24 million parameters of a ResNet-50 achieving a 60%
accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks
approaches 50%. Our task along with the model and results can potentially lead
to a new, more computationally efficient paradigm of training networks. Our
model also learns a strong representation of neural architectures enabling
their analysis.
Related papers
- Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Receptive Field Refinement for Convolutional Neural Networks Reliably
Improves Predictive Performance [1.52292571922932]
We present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains.
Our approach is able to improve ImageNet1K performance across a wide range of well-known, state-of-the-art (SOTA) model classes.
arXiv Detail & Related papers (2022-11-26T05:27:44Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Pretraining a Neural Network before Knowing Its Architecture [2.170169149901781]
Training large neural networks is possible by training a smaller hypernetwork that predicts parameters for the large ones.
A recently released Graph HyperNetwork (GHN) trained this way on one million smaller ImageNet architectures is able to predict parameters for large unseen networks such as ResNet-50.
While networks with predicted parameters lose performance on the source task, the predicted parameters have been found useful for fine-tuning on other tasks.
arXiv Detail & Related papers (2022-07-20T17:27:50Z) - DNNAbacus: Toward Accurate Computational Cost Prediction for Deep Neural
Networks [0.9896984829010892]
This paper investigates the computational resource demands of 29 classical deep neural networks and builds accurate models for predicting computational costs.
We propose a lightweight prediction approach DNNAbacus with a novel network structural matrix for network representation.
Our experimental results show that the mean relative error (MRE) is 0.9% with respect to time and 2.8% with respect to memory for 29 classic models, which is much lower than the state-of-the-art works.
arXiv Detail & Related papers (2022-05-24T14:21:27Z) - Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling
and Design [68.1682448368636]
We present a supervised pretraining approach to learn circuit representations that can be adapted to new unseen topologies or unseen prediction tasks.
To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings.
We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties.
arXiv Detail & Related papers (2022-03-29T21:18:47Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z) - Inferring Convolutional Neural Networks' accuracies from their
architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance.
We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems.
We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.