In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization
- URL: http://arxiv.org/abs/2404.16795v3
- Date: Mon, 12 Aug 2024 12:24:45 GMT
- Title: In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization
- Authors: Herilalaina Rakotoarison, Steven Adriaensen, Neeratyoy Mallik, Samir Garibov, Edward Bergman, Frank Hutter,
- Abstract summary: We propose FT-PFN, a novel surrogate for Freeze-thaw style BO.
FT-PFN is a prior-data fitted network (PFN) that leverages the transformers' in-context learning ability.
We show that, when combined with our novel acquisition mechanism (I-random), the resulting in-context freeze-thaw BO method (ifBO) yields new state-of-the-art performance.
- Score: 35.74766507227412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this approach pose challenges for existing methods, requiring retraining or fine-tuning their neural network surrogates online, introducing overhead, instability, and hyper-hyperparameters. In this work, we propose FT-PFN, a novel surrogate for Freeze-thaw style BO. FT-PFN is a prior-data fitted network (PFN) that leverages the transformers' in-context learning ability to efficiently and reliably do Bayesian learning curve extrapolation in a single forward pass. Our empirical analysis across three benchmark suites shows that the predictions made by FT-PFN are more accurate and 10-100 times faster than those of the deep Gaussian process and deep ensemble surrogates used in previous work. Furthermore, we show that, when combined with our novel acquisition mechanism (MFPI-random), the resulting in-context freeze-thaw BO method (ifBO), yields new state-of-the-art performance in the same three families of deep learning HPO benchmarks considered in prior work.
Related papers
- Bayesian Optimization for Hyperparameters Tuning in Neural Networks [0.0]
Bayesian Optimization is a derivative-free global optimization method suitable for black-box functions with continuous inputs and limited evaluation budgets.
This study investigates the application of BO for the hyper parameter tuning of neural networks, specifically targeting the enhancement of Convolutional Neural Networks (CNN)
Experimental outcomes reveal that BO effectively balances exploration and exploitation, converging rapidly towards optimal settings for CNN architectures.
This approach underlines the potential of BO in automating neural network tuning, contributing to improved accuracy and computational efficiency in machine learning pipelines.
arXiv Detail & Related papers (2024-10-29T09:23:24Z) - Cost-Sensitive Multi-Fidelity Bayesian Optimization with Transfer of Learning Curve Extrapolation [55.75188191403343]
We introduce utility, which is a function predefined by each user and describes the trade-off between cost and performance of BO.
We validate our algorithm on various LC datasets and found it outperform all the previous multi-fidelity BO and transfer-BO baselines we consider.
arXiv Detail & Related papers (2024-05-28T07:38:39Z) - Poisson Process for Bayesian Optimization [126.51200593377739]
We propose a ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO)
Compared to the classic GP-BO method, our PoPBO has lower costs and better robustness to noise, which is verified by abundant experiments.
arXiv Detail & Related papers (2024-02-05T02:54:50Z) - Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter Estimation [44.53678257757108]
We propose a new BO method that can sub-linearly converge to the objective function's global optimum.
Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process.
We demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.
arXiv Detail & Related papers (2023-06-12T03:35:45Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction.
Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z) - Deep-Ensemble-Based Uncertainty Quantification in Spatiotemporal Graph
Neural Networks for Traffic Forecasting [2.088376060651494]
We focus on a diffusion convolutional recurrent neural network (DCRNN), a state-of-the-art method for short-term traffic forecasting.
We develop a scalable deep ensemble approach to quantify uncertainties for DCRNN.
We show that our generic and scalable approach outperforms the current state-of-the-art Bayesian and a number of other commonly used frequentist techniques.
arXiv Detail & Related papers (2022-04-04T16:10:55Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Cost-Efficient Online Hyperparameter Optimization [94.60924644778558]
We propose an online HPO algorithm that reaches human expert-level performance within a single run of the experiment.
Our proposed online HPO algorithm reaches human expert-level performance within a single run of the experiment, while incurring only modest computational overhead compared to regular training.
arXiv Detail & Related papers (2021-01-17T04:55:30Z) - Fusion-Catalyzed Pruning for Optimizing Deep Learning on Intelligent
Edge Devices [9.313154178072049]
We present a novel fusion-parametric pruning approach, called FuPruner, for accelerating neural networks.
We introduce an aggressive fusion method to equivalently transform a model, which extends the optimization space of pruning.
FuPruner provides optimization options for controlling fusion and pruning, allowing much more flexible performance-accuracy trade-offs to be made.
arXiv Detail & Related papers (2020-10-30T10:10:08Z) - Federated Bayesian Optimization via Thompson Sampling [33.087439644066876]
This paper presents federated Thompson sampling (FTS) which overcomes a number of key challenges of FBO and FL in a principled way.
We empirically demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency, and practical performance.
arXiv Detail & Related papers (2020-10-20T09:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.