Hyperparameter Optimization through Neural Network Partitioning
- URL: http://arxiv.org/abs/2304.14766v1
- Date: Fri, 28 Apr 2023 11:24:41 GMT
- Title: Hyperparameter Optimization through Neural Network Partitioning
- Authors: Bruno Mlodozeniec, Matthias Reisser, Christos Louizos
- Abstract summary: We propose a simple and efficient way for optimizing hyper parameters in neural networks.
Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions.
We demonstrate that we can apply this objective to optimize a variety of different hyper parameters in a single training run.
- Score: 11.6941692990626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Well-tuned hyperparameters are crucial for obtaining good generalization
behavior in neural networks. They can enforce appropriate inductive biases,
regularize the model and improve performance -- especially in the presence of
limited data. In this work, we propose a simple and efficient way for
optimizing hyperparameters inspired by the marginal likelihood, an optimization
objective that requires no validation data. Our method partitions the training
data and a neural network model into $K$ data shards and parameter partitions,
respectively. Each partition is associated with and optimized only on specific
data shards. Combining these partitions into subnetworks allows us to define
the ``out-of-training-sample" loss of a subnetwork, i.e., the loss on data
shards unseen by the subnetwork, as the objective for hyperparameter
optimization. We demonstrate that we can apply this objective to optimize a
variety of different hyperparameters in a single training run while being
significantly computationally cheaper than alternative methods aiming to
optimize the marginal likelihood for neural networks. Lastly, we also focus on
optimizing hyperparameters in federated learning, where retraining and
cross-validation are particularly challenging.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Optimal Hyperparameter $\epsilon$ for Adaptive Stochastic Optimizers
through Gradient Histograms [0.8702432681310399]
We introduce a new framework based on gradient histograms to analyze and justify attributes adaptives.
We propose a novel gradient histogram-based algorithm that automatically estimates a reduced and accurate search space for the safeguard factor $epsilon$.
arXiv Detail & Related papers (2023-11-20T04:34:19Z) - AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient
Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning.
We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z) - Automatic prior selection for meta Bayesian optimization with a case
study on tuning deep neural network optimizers [47.013395100497775]
We propose a principled approach to solve such expensive hyperparameter tuning problems efficiently.
Key to the performance of BO is specifying and refining a distribution over functions, which is used to reason about the optima of the underlying function being optimized.
We verify our approach in realistic model training setups by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets.
arXiv Detail & Related papers (2021-09-16T20:46:26Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Delta-STN: Efficient Bilevel Optimization for Neural Networks using
Structured Response Jacobians [5.33024001730262]
Self-Tuning Networks (STNs) have recently gained traction due to their ability to amortize the optimization of the inner objective.
We propose the $Delta$-STN, an improved hypernetwork architecture which stabilizes training.
arXiv Detail & Related papers (2020-10-26T12:12:23Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - Automatic Setting of DNN Hyper-Parameters by Mixing Bayesian
Optimization and Tuning Rules [0.6875312133832078]
We build a new algorithm for evaluating and analyzing the results of the network on the training and validation sets.
We use a set of tuning rules to add new hyper-parameters and/or to reduce the hyper- parameter search space to select a better combination.
arXiv Detail & Related papers (2020-06-03T08:53:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.