Hyperparameter Transfer Learning with Adaptive Complexity
- URL: http://arxiv.org/abs/2102.12810v1
- Date: Thu, 25 Feb 2021 12:26:52 GMT
- Title: Hyperparameter Transfer Learning with Adaptive Complexity
- Authors: Samuel Horv\'ath, Aaron Klein, Peter Richt\'arik, C\'edric Archambeau
- Abstract summary: We propose a new multi-task BO method that learns a set of ordered, non-linear basis functions of increasing complexity via nested drop-out and automatic relevance determination.
- Score: 5.695163312473305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian optimization (BO) is a sample efficient approach to automatically
tune the hyperparameters of machine learning models. In practice, one
frequently has to solve similar hyperparameter tuning problems sequentially.
For example, one might have to tune a type of neural network learned across a
series of different classification problems. Recent work on multi-task BO
exploits knowledge gained from previous tuning tasks to speed up a new tuning
task. However, previous approaches do not account for the fact that BO is a
sequential decision making procedure. Hence, there is in general a mismatch
between the number of evaluations collected in the current tuning task compared
to the number of evaluations accumulated in all previously completed tasks. In
this work, we enable multi-task BO to compensate for this mismatch, such that
the transfer learning procedure is able to handle different data regimes in a
principled way. We propose a new multi-task BO method that learns a set of
ordered, non-linear basis functions of increasing complexity via nested
drop-out and automatic relevance determination. Experiments on a variety of
hyperparameter tuning problems show that our method improves the sample ef
Related papers
- Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach [17.79010397902909]
We study the problem of fine-tuning a language model (LM) for a target task by optimally using the information from $n$ auxiliary tasks.
This problem has broad applications in NLP, such as targeted instruction tuning and data selection in chain-of-thought fine-tuning.
We introduce a new algorithm to estimate model fine-tuning performances without repeated training.
arXiv Detail & Related papers (2024-09-28T21:26:50Z) - Unsupervised Learning of Initialization in Deep Neural Networks via
Maximum Mean Discrepancy [74.34895342081407]
We propose an unsupervised algorithm to find good initialization for input data.
We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification.
We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
arXiv Detail & Related papers (2023-02-08T23:23:28Z) - Transfer Learning based Search Space Design for Hyperparameter Tuning [31.96809688536572]
We introduce an automatic method to design the BO search space with the aid of tuning history from past tasks.
This simple yet effective approach can be used to endow many existing BO methods with transfer learning capabilities.
arXiv Detail & Related papers (2022-06-06T11:48:58Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Few-Shot Bayesian Optimization with Deep Kernel Surrogates [7.208515071018781]
We propose a few-shot learning problem in which we train a shared deep surrogate model to adapt to the response function of a new task.
We propose the use of a deep kernel network for a Gaussian process surrogate that is meta-learned in an end-to-end fashion.
As a result, the novel few-shot optimization of our deep kernel surrogate leads to new state-of-the-art results at HPO.
arXiv Detail & Related papers (2021-01-19T15:00:39Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - Online Parameter-Free Learning of Multiple Low Variance Tasks [36.08679456245112]
We propose a method to learn a common bias vector for a growing sequence of low-variance tasks.
Our approach is presented in the non-statistical setting and can be of two variants.
Experiments confirm the effectiveness of our methods in practice.
arXiv Detail & Related papers (2020-07-11T09:52:53Z) - Automatic Tuning of Stochastic Gradient Descent with Bayesian
Optimisation [8.340191147575307]
We introduce an original probabilistic model for traces of optimisers, based on latent Gaussian processes and an auto-/regressive formulation.
It flexibly adjusts to abrupt changes of behaviours induced by new learning rate values.
It is well-suited to tackle a set of problems: first, for the on-line adaptation of the learning rate for a cold-started run; then, for tuning the schedule for a set of similar tasks, as well as warm-starting it for a new task.
arXiv Detail & Related papers (2020-06-25T13:18:18Z) - Continual Learning using a Bayesian Nonparametric Dictionary of Weight
Factors [75.58555462743585]
Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings.
We propose a principled nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity.
We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training.
arXiv Detail & Related papers (2020-04-21T15:20:19Z) - Stepwise Model Selection for Sequence Prediction via Deep Kernel
Learning [100.83444258562263]
We propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting.
In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions.
We are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose.
arXiv Detail & Related papers (2020-01-12T09:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.