Delta-STN: Efficient Bilevel Optimization for Neural Networks using
Structured Response Jacobians
- URL: http://arxiv.org/abs/2010.13514v1
- Date: Mon, 26 Oct 2020 12:12:23 GMT
- Title: Delta-STN: Efficient Bilevel Optimization for Neural Networks using
Structured Response Jacobians
- Authors: Juhan Bae, Roger Grosse
- Abstract summary: Self-Tuning Networks (STNs) have recently gained traction due to their ability to amortize the optimization of the inner objective.
We propose the $Delta$-STN, an improved hypernetwork architecture which stabilizes training.
- Score: 5.33024001730262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hyperparameter optimization of neural networks can be elegantly formulated as
a bilevel optimization problem. While research on bilevel optimization of
neural networks has been dominated by implicit differentiation and unrolling,
hypernetworks such as Self-Tuning Networks (STNs) have recently gained traction
due to their ability to amortize the optimization of the inner objective. In
this paper, we diagnose several subtle pathologies in the training of STNs.
Based on these observations, we propose the $\Delta$-STN, an improved
hypernetwork architecture which stabilizes training and optimizes
hyperparameters much more efficiently than STNs. The key idea is to focus on
accurately approximating the best-response Jacobian rather than the full
best-response function; we achieve this by reparameterizing the hypernetwork
and linearizing the network around the current parameters. We demonstrate
empirically that our $\Delta$-STN can tune regularization hyperparameters (e.g.
weight decay, dropout, number of cutout holes) with higher accuracy, faster
convergence, and improved stability compared to existing approaches.
Related papers
- Bayesian Optimization for Hyperparameters Tuning in Neural Networks [0.0]
Bayesian Optimization is a derivative-free global optimization method suitable for black-box functions with continuous inputs and limited evaluation budgets.
This study investigates the application of BO for the hyper parameter tuning of neural networks, specifically targeting the enhancement of Convolutional Neural Networks (CNN)
Experimental outcomes reveal that BO effectively balances exploration and exploitation, converging rapidly towards optimal settings for CNN architectures.
This approach underlines the potential of BO in automating neural network tuning, contributing to improved accuracy and computational efficiency in machine learning pipelines.
arXiv Detail & Related papers (2024-10-29T09:23:24Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Hyperparameter Optimization through Neural Network Partitioning [11.6941692990626]
We propose a simple and efficient way for optimizing hyper parameters in neural networks.
Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions.
We demonstrate that we can apply this objective to optimize a variety of different hyper parameters in a single training run.
arXiv Detail & Related papers (2023-04-28T11:24:41Z) - Robust Deep Compressive Sensing with Recurrent-Residual Structural
Constraints [0.0]
Existing deep sensing (CS) methods either ignore adaptive online optimization or depend on costly iterative reconstruction.
This work explores a novel image CS framework with recurrent-residual structural constraint, termed as R$2$CS-NET.
As the first deep CS framework efficiently bridging adaptive online optimization, the R$2$CS-NET integrates the robustness of online optimization with the efficiency and nonlinear capacity of deep learning methods.
arXiv Detail & Related papers (2022-07-15T05:56:13Z) - Bayesian Hyperparameter Optimization for Deep Neural Network-Based
Network Intrusion Detection [2.304713283039168]
Deep neural networks (DNN) have been successfully applied for intrusion detection problems.
This paper proposes a novel Bayesian optimization-based framework for the automatic optimization of hyper parameters.
We show that the proposed framework demonstrates significantly higher intrusion detection performance than the random search optimization-based approach.
arXiv Detail & Related papers (2022-07-07T20:08:38Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop.
A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Hyperparameter Optimization in Binary Communication Networks for
Neuromorphic Deployment [4.280642750854163]
Training neural networks for neuromorphic deployment is non-trivial.
We introduce a Bayesian approach for optimizing the hyper parameters of an algorithm for training binary communication networks that can be deployed to neuromorphic hardware.
We show that by optimizing the hyper parameters on this algorithm for each dataset, we can achieve improvements in accuracy over the previous state-of-the-art for this algorithm on each dataset.
arXiv Detail & Related papers (2020-04-21T01:15:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.