Related papers: Hyperparameter Optimization through Neural Network Partitioning

Hyperparameter Optimization through Neural Network Partitioning

URL: http://arxiv.org/abs/2304.14766v1
Date: Fri, 28 Apr 2023 11:24:41 GMT
Title: Hyperparameter Optimization through Neural Network Partitioning
Authors: Bruno Mlodozeniec, Matthias Reisser, Christos Louizos
Abstract summary: We propose a simple and efficient way for optimizing hyper parameters in neural networks. Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions. We demonstrate that we can apply this objective to optimize a variety of different hyper parameters in a single training run.
Score: 11.6941692990626
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Well-tuned hyperparameters are crucial for obtaining good generalization behavior in neural networks. They can enforce appropriate inductive biases, regularize the model and improve performance -- especially in the presence of limited data. In this work, we propose a simple and efficient way for optimizing hyperparameters inspired by the marginal likelihood, an optimization objective that requires no validation data. Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions, respectively. Each partition is associated with and optimized only on specific data shards. Combining these partitions into subnetworks allows us to define the ``out-of-training-sample" loss of a subnetwork, i.e., the loss on data shards unseen by the subnetwork, as the objective for hyperparameter optimization. We demonstrate that we can apply this objective to optimize a variety of different hyperparameters in a single training run while being significantly computationally cheaper than alternative methods aiming to optimize the marginal likelihood for neural networks. Lastly, we also focus on optimizing hyperparameters in federated learning, where retraining and cross-validation are particularly challenging.

Related papers

Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? [42.362388367152256]
Large language models (LLMs) are used to fine-tune a parameter-efficient version of Code Llama using LoRA. Our method achieves competitive or superior results in terms of Root Mean Square Error (RMSE) while significantly reducing computational overhead.
arXiv Detail & Related papers (2025-04-08T13:15:47Z)
Towards hyperparameter-free optimization with differential privacy [9.193537596304669]
Differential privacy (DP) is a privacy-preserving paradigm that protects the training data when training deep learning models. In this work, we adapt the automatic learning rate schedule to DP optimization for any models and achieves state-of-the-art DP performance on various language and vision tasks.
arXiv Detail & Related papers (2025-03-02T02:59:52Z)
Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
Efficient Parametric Approximations of Neural Network Function Space Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z)
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning. We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z)
JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data [86.8949732640035]
We propose JUMBO, an MBO algorithm that sidesteps limitations by querying additional data. We show that it achieves no-regret under conditions analogous to GP-UCB. Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems.
arXiv Detail & Related papers (2021-06-02T05:03:38Z)
Learning Regularization Parameters of Inverse Problems via Deep Neural Networks [0.0]
We consider a supervised learning approach, where a network is trained to approximate the mapping from observation data to regularization parameters. We show that a wide variety of regularization functionals, forward models, and noise models may be considered. The network-obtained regularization parameters can be computed more efficiently and may even lead to more accurate solutions.
arXiv Detail & Related papers (2021-04-14T02:38:38Z)
Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs) It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously. This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z)
Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians [5.33024001730262]
Self-Tuning Networks (STNs) have recently gained traction due to their ability to amortize the optimization of the inner objective. We propose the $Delta$-STN, an improved hypernetwork architecture which stabilizes training.
arXiv Detail & Related papers (2020-10-26T12:12:23Z)
How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency. A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search. We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z)
Automatic Setting of DNN Hyper-Parameters by Mixing Bayesian Optimization and Tuning Rules [0.6875312133832078]
We build a new algorithm for evaluating and analyzing the results of the network on the training and validation sets. We use a set of tuning rules to add new hyper-parameters and/or to reduce the hyper- parameter search space to select a better combination.
arXiv Detail & Related papers (2020-06-03T08:53:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.