Related papers: Magnitude Invariant Parametrizations Improve Hypernetwork Learning

Magnitude Invariant Parametrizations Improve Hypernetwork Learning

URL: http://arxiv.org/abs/2304.07645v2
Date: Thu, 29 Jun 2023 16:38:42 GMT
Title: Magnitude Invariant Parametrizations Improve Hypernetwork Learning
Authors: Jose Javier Gonzalez Ortiz, John Guttag, Adrian Dalca
Abstract summary: Hypernetworks are powerful neural networks that predict the parameters of another neural network. Training typically converges far more slowly than for non-hypernetwork models. We identify a fundamental and previously unidentified problem that contributes to the challenge of training hypernetworks. We present a simple solution to this problem using a revised hypernetwork formulation that we call Magnitude Invariant Parametrizations (MIP)
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hypernetworks, neural networks that predict the parameters of another neural network, are powerful models that have been successfully used in diverse applications from image generation to multi-task learning. Unfortunately, existing hypernetworks are often challenging to train. Training typically converges far more slowly than for non-hypernetwork models, and the rate of convergence can be very sensitive to hyperparameter choices. In this work, we identify a fundamental and previously unidentified problem that contributes to the challenge of training hypernetworks: a magnitude proportionality between the inputs and outputs of the hypernetwork. We demonstrate both analytically and empirically that this can lead to unstable optimization, thereby slowing down convergence, and sometimes even preventing any learning. We present a simple solution to this problem using a revised hypernetwork formulation that we call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposed solution on several hypernetwork tasks, where it consistently stabilizes training and achieves faster convergence. Furthermore, we perform a comprehensive ablation study including choices of activation function, normalization strategies, input dimensionality, and hypernetwork architecture; and find that MIP improves training in all scenarios. We provide easy-to-use code that can turn existing networks into MIP-based hypernetworks.

Related papers

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv Detail & Related papers (2025-06-20T17:54:24Z)
HiPreNets: High-Precision Neural Networks through Progressive Training [1.5429976366871665]
We present a framework for tuning and high-precision neural networks (HiPreNets)<n>Our approach refines a previously explored staged training technique for neural networks.<n>We discuss how to take advantage of the structure of the residuals to guide the choice loss function number parameters to use.
arXiv Detail & Related papers (2025-06-18T02:12:24Z)
Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process. In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture. We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z)
Split-Boost Neural Networks [1.1549572298362787]
We propose an innovative training strategy for feed-forward architectures - called split-boost. Such a novel approach ultimately allows us to avoid explicitly modeling the regularization term. The proposed strategy is tested on a real-world (anonymized) dataset within a benchmark medical insurance design problem.
arXiv Detail & Related papers (2023-09-06T17:08:57Z)
The Underlying Correlated Dynamics in Neural Training [6.385006149689549]
Training of neural networks is a computationally intensive task. We propose a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality. This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.
arXiv Detail & Related papers (2022-12-18T08:34:11Z)
Learning Fast and Slow for Online Time Series Forecasting [76.50127663309604]
Fast and Slow learning Networks (FSNet) is a holistic framework for online time-series forecasting. FSNet balances fast adaptation to recent changes and retrieving similar old knowledge. Our code will be made publicly available.
arXiv Detail & Related papers (2022-02-23T18:23:07Z)
Hypernetwork Dismantling via Deep Reinforcement Learning [1.4877837830677472]
We formulate the hypernetwork dismantling problem as a node sequence decision problem. We propose a deep reinforcement learning-based hypernetwork dismantling framework. Experimental results on five real-world hypernetworks demonstrate the effectiveness of our proposed framework.
arXiv Detail & Related papers (2021-04-29T13:35:29Z)
Revisiting the double-well problem by deep learning with a hybrid network [7.308730248177914]
We propose a novel hybrid network which integrates two different kinds of neural networks: LSTM and ResNet. Such a hybrid network can be applied for solving cooperative dynamics in a system with fast spatial or temporal modulations.
arXiv Detail & Related papers (2021-04-25T07:51:43Z)
All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network. Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student. To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z)
Supervised training of spiking neural networks for robust deployment on mixed-signal neuromorphic processors [2.6949002029513167]
Mixed-signal analog/digital electronic circuits can emulate spiking neurons and synapses with extremely high energy efficiency. Mismatch is expressed as differences in effective parameters between identically-configured neurons and synapses. We present a supervised learning approach that addresses this challenge by maximizing robustness to mismatch and other common sources of noise.
arXiv Detail & Related papers (2021-02-12T09:20:49Z)
Phase Retrieval using Expectation Consistent Signal Recovery Algorithm based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems. Recent advances in deep learning have opened up a new possibility for robust and fast PR. We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Molecule Property Prediction and Classification with Graph Hypernetworks [113.38181979662288]
We show that the replacement of the underlying networks with hypernetworks leads to a boost in performance. A major difficulty in the application of hypernetworks is their lack of stability. A recent work has tackled the training instability of hypernetworks in the context of error correcting codes.
arXiv Detail & Related papers (2020-02-01T16:44:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.