Principled Weight Initialization for Hypernetworks
- URL: http://arxiv.org/abs/2312.08399v1
- Date: Wed, 13 Dec 2023 04:49:18 GMT
- Title: Principled Weight Initialization for Hypernetworks
- Authors: Oscar Chang, Lampros Flokas, Hod Lipson
- Abstract summary: Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner.
We show that classical weight-as-a-service methods fail to produce weights for the mainnet in the correct scale.
We develop principled techniques for weight-as-a-service in hypernets, and show that they lead to more stable mainnet weights, lower training loss, and faster convergence.
- Score: 15.728811174027896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hypernetworks are meta neural networks that generate weights for a main
neural network in an end-to-end differentiable manner. Despite extensive
applications ranging from multi-task learning to Bayesian deep learning, the
problem of optimizing hypernetworks has not been studied to date. We observe
that classical weight initialization methods like Glorot & Bengio (2010) and He
et al. (2015), when applied directly on a hypernet, fail to produce weights for
the mainnet in the correct scale. We develop principled techniques for weight
initialization in hypernets, and show that they lead to more stable mainnet
weights, lower training loss, and faster convergence.
Related papers
- HyperLoRA for PDEs [7.898728380447954]
Physics-informed neural networks (PINNs) have been widely used to develop neural surrogates for solutions of Partial Differential Equations.
A drawback of PINNs is that they have to be retrained with every change in initial-boundary conditions and PDE coefficients.
The Hypernetwork, a model-based meta learning technique, takes in a parameterized task embedding as input and predicts the weights of PINN as output.
arXiv Detail & Related papers (2023-08-18T04:29:48Z) - Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks.
We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Improvements to Gradient Descent Methods for Quantum Tensor Network
Machine Learning [0.0]
We introduce a copy node' method that successfully initializes arbitrary tensor networks.
We present numerical results that show that the combination of techniques presented here produces quantum inspired tensor network models.
arXiv Detail & Related papers (2022-03-03T19:00:40Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z) - Neural networks with late-phase weights [66.72777753269658]
We show that the solutions found by SGD can be further improved by ensembling a subset of the weights in late stages of learning.
At the end of learning, we obtain back a single model by taking a spatial average in weight space.
arXiv Detail & Related papers (2020-07-25T13:23:37Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.