Bayesian Optimization for Enhanced Language Models: Optimizing Acquisition Functions
- URL: http://arxiv.org/abs/2505.17151v1
- Date: Thu, 22 May 2025 10:16:56 GMT
- Title: Bayesian Optimization for Enhanced Language Models: Optimizing Acquisition Functions
- Authors: Zishuo Bao, Yibo Liu, Changyutao Qiu,
- Abstract summary: This work introducesBilevel - BO - SWA, a model fusion approach coupled with a bilevel BO strategy to improve the fine - tunning of large language models.<n>Our work on mixture of acquisition functions like EI and UCB into nested opt loops, where inner loop perform minimization of training loss while outer loops optimized w.r.t. val metric.<n>Experiments on GLUE tasks using RoBERTA - base show that when using EI and UCB, there is an improvement in generalization, and fine - tuning can be improved by up to 2.7%.
- Score: 0.6554326244334868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rise of different language model architecture, fine-tuning is becoming even more important for down stream tasks Model gets messy, finding proper hyperparameters for fine-tuning. Although BO has been tried for hyperparameter tuning, most of the existing methods are oblivious to the fact that BO relies on careful choices of acquisition functions, which are essential components of BO that guide how much to explore versus exploit during the optimization process; Different acquisition functions have different levels of sensitivity towards training loss and validation performance; existing methods often just apply an acquisition function no matter if the training and validation performance are sensitive to the acquisition function or not. This work introduces{Bilevel - BO - SWA}, a model fusion approach coupled with a bilevel BO strategy to improve the fine - tunning of large language models. Our work on mixture of acquisition functions like EI and UCB into nested opt loops, where inner loop perform minimization of training loss while outer loops optimized w.r.t. val metric. Experiments on GLUE tasks using RoBERTA - base show that when using EI and UCB, there is an improvement in generalization, and fine - tuning can be improved by up to 2.7%.
Related papers
- FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch [21.41322548859776]
We show how FunBO can be used to learn new acquisition functions written in computer code.
We show how FunBO identifies AFs that generalize well in and out of the training distribution of functions.
arXiv Detail & Related papers (2024-06-07T10:49:59Z) - Reinforced In-Context Black-Box Optimization [64.25546325063272]
RIBBO is a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion.
RIBBO employs expressive sequence models to learn the optimization histories produced by multiple behavior algorithms and tasks.
Central to our method is to augment the optimization histories with textitregret-to-go tokens, which are designed to represent the performance of an algorithm based on cumulative regret over the future part of the histories.
arXiv Detail & Related papers (2024-02-27T11:32:14Z) - Large Language Models to Enhance Bayesian Optimization [57.474613739645605]
We present LLAMBO, a novel approach that integrates the capabilities of Large Language Models (LLM) within Bayesian optimization.
At a high level, we frame the BO problem in natural language, enabling LLMs to iteratively propose and evaluate promising solutions conditioned on historical evaluations.
Our findings illustrate that LLAMBO is effective at zero-shot warmstarting, and enhances surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse.
arXiv Detail & Related papers (2024-02-06T11:44:06Z) - Non-Convex Bilevel Optimization with Time-Varying Objective Functions [57.299128109226025]
We propose an online bilevel optimization where the functions can be time-varying and the agent continuously updates the decisions with online data.
Compared to existing algorithms, SOBOW is computationally efficient and does not need to know previous functions.
We show that SOBOW can achieve a sublinear bilevel local regret under mild conditions.
arXiv Detail & Related papers (2023-08-07T06:27:57Z) - Predictive Modeling through Hyper-Bayesian Optimization [60.586813904500595]
We propose a novel way of integrating model selection and BO for the single goal of reaching the function optima faster.
The algorithm moves back and forth between BO in the model space and BO in the function space, where the goodness of the recommended model is captured.
In addition to improved sample efficiency, the framework outputs information about the black-box function.
arXiv Detail & Related papers (2023-08-01T04:46:58Z) - HyperBO+: Pre-training a universal prior for Bayesian optimization with
hierarchical Gaussian processes [7.963551878308098]
HyperBO+ is a pre-training approach for hierarchical Gaussian processes.
We show that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
arXiv Detail & Related papers (2022-12-20T18:47:10Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - A General Recipe for Likelihood-free Bayesian Optimization [115.82591413062546]
We propose likelihood-free BO (LFBO) to extend BO to a broader class of models and utilities.
LFBO directly models the acquisition function without having to separately perform inference with a probabilistic surrogate model.
We show that computing the acquisition function in LFBO can be reduced to optimizing a weighted classification problem.
arXiv Detail & Related papers (2022-06-27T03:55:27Z) - Bayesian Optimization over Permutation Spaces [30.650753803587794]
We propose and evaluate two algorithms for BO over Permutation Spaces (BOPS)
We theoretically analyze the performance of BOPS-T to show that their regret grows sub-linearly.
Our experiments on multiple synthetic and real-world benchmarks show that both BOPS-T and BOPS-H perform better than the state-of-the-art BO algorithm for spaces.
arXiv Detail & Related papers (2021-12-02T08:20:50Z) - Efficient Exploration in Binary and Preferential Bayesian Optimization [0.5076419064097732]
We show that it is important for BO algorithms to distinguish between different types of uncertainty.
We propose several new acquisition functions that outperform state-of-the-art BO functions.
arXiv Detail & Related papers (2021-10-18T14:44:34Z) - Pre-trained Gaussian Processes for Bayesian Optimization [24.730678780782647]
We propose a new pre-training based BO framework named HyperBO.
We show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known.
arXiv Detail & Related papers (2021-09-16T20:46:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.