A Bayesian approach for prompt optimization in pre-trained language
models
- URL: http://arxiv.org/abs/2312.00471v1
- Date: Fri, 1 Dec 2023 10:10:18 GMT
- Title: A Bayesian approach for prompt optimization in pre-trained language
models
- Authors: Antonio Sabbatella, Andrea Ponti, Antonio Candelieri, Ilaria Giordani,
Francesco Archetti
- Abstract summary: In this paper we focus on hard prompt tuning (HPT) which directly searches for discrete tokens to be added to the text input with-out access to the large language model (LLM)
In this paper we use BoTorch, a library for Bayesian optimization research built on top of pyTorch.
- Score: 1.980639720136382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A prompt is a sequence of symbol or tokens, selected from a vocabulary
according to some rule, which is prepended/concatenated to a textual query. A
key problem is how to select the sequence of tokens: in this paper we formulate
it as a combinatorial optimization problem. The high dimensionality of the
token space com-pounded by the length of the prompt sequence requires a very
efficient solution. In this paper we propose a Bayesian optimization method,
executed in a continuous em-bedding of the combinatorial space. In this paper
we focus on hard prompt tuning (HPT) which directly searches for discrete
tokens to be added to the text input with-out requiring access to the large
language model (LLM) and can be used also when LLM is available only as a
black-box. This is critically important if LLMs are made available in the Model
as a Service (MaaS) manner as in GPT-4. The current manu-script is focused on
the optimization of discrete prompts for classification tasks. The discrete
prompts give rise to difficult combinatorial optimization problem which easily
become intractable given the dimension of the token space in realistic
applications. The optimization method considered in this paper is Bayesian
optimization (BO) which has become the dominant approach in black-box
optimization for its sample efficiency along with its modular structure and
versatility. In this paper we use BoTorch, a library for Bayesian optimization
research built on top of pyTorch. Albeit preliminary and obtained using a
'vanilla' version of BO, the experiments on RoB-ERTa on six benchmarks, show a
good performance across a variety of tasks and enable an analysis of the
tradeoff between size of the search space, accuracy and wall clock time.
Related papers
- Hyperband-based Bayesian Optimization for Black-box Prompt Selection [15.756224286651237]
Black-box prompt selection is challenging due to potentially large, search spaces, absence of gradient information, and high evaluation cost of prompts on a validation set.<n>We propose HbBoPs, a novel method that combines a structural-aware deep kernel Gaussian Process with Hyperband as a multi-fidelity scheduler.<n>HbBoPs outperforms state-of-the-art methods in both performance and efficiency.
arXiv Detail & Related papers (2024-12-10T14:42:51Z) - A Simple and Efficient Approach to Batch Bayesian Optimization [0.0]
We propose a simple and efficient approach to extend Bayesian optimization to large-scale batch evaluation.
By optimizing expected subspace improvement functions simultaneously, we can get a batch of query points for parallel evaluation.
arXiv Detail & Related papers (2024-11-25T09:14:09Z) - Large Language Models Prompting With Episodic Memory [53.8690170372303]
We propose PrOmpting with Episodic Memory (POEM), a novel prompt optimization technique that is simple, efficient, and demonstrates strong generalization capabilities.
In the testing phase, we optimize the sequence of examples for each test query by selecting the sequence that yields the highest total rewards from the top-k most similar training examples in the episodic memory.
Our results show that POEM outperforms recent techniques like TEMPERA and RLPrompt by over 5.3% in various text classification tasks.
arXiv Detail & Related papers (2024-08-14T11:19:28Z) - Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences [12.248793682283964]
optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design.
We develop a unified framework to test a vast array of high-dimensional Bayesian optimization methods and a collection of standardized black-box functions.
These two components of the benchmark are each supported by flexible, scalable, and easily extendable software libraries.
arXiv Detail & Related papers (2024-06-07T08:39:40Z) - Bayesian Optimization over High-Dimensional Combinatorial Spaces via
Dictionary-based Embeddings [36.60636056219264]
We consider the problem of optimizing black-box functions over high-dimensional spaces in science, engineering, and ML applications.
Key idea is to select a number of discrete structures from the input space and use them to define an ordinal embedding for high-dimensional structures.
We develop a principled approach based on binary wavelets to construct dictionaries for binary spaces, and propose a randomized construction method that generalizes to categorical spaces.
arXiv Detail & Related papers (2023-03-03T08:31:42Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent
for Few-Shot Learning [83.26610968655815]
Black-Box Tuning is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models.
We present BBTv2, a pure black-box optimization approach that can drive language models to achieve comparable results to gradient-based optimization.
arXiv Detail & Related papers (2022-05-23T11:10:19Z) - Bayesian Optimization over Permutation Spaces [30.650753803587794]
We propose and evaluate two algorithms for BO over Permutation Spaces (BOPS)
We theoretically analyze the performance of BOPS-T to show that their regret grows sub-linearly.
Our experiments on multiple synthetic and real-world benchmarks show that both BOPS-T and BOPS-H perform better than the state-of-the-art BO algorithm for spaces.
arXiv Detail & Related papers (2021-12-02T08:20:50Z) - Bayesian Algorithm Execution: Estimating Computable Properties of
Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations.
We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output.
On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z) - Mercer Features for Efficient Combinatorial Bayesian Optimization [32.856318660282255]
Bayesian optimization (BO) is an efficient framework for solving black-box optimization problems with expensive function evaluations.
This paper addresses the BO problem setting for spaces (e.g., sequences and graphs) that occurs naturally in science and engineering applications.
The key challenge is to balance the complexity of statistical models and tractability of search to select structures for evaluation.
arXiv Detail & Related papers (2020-12-14T17:58:39Z) - BOSS: Bayesian Optimization over String Spaces [15.630421177117634]
This article develops a Bayesian optimization (BO) method which acts directly over raw strings.
It proposes the first uses of string kernels and genetic algorithms within BO loops.
arXiv Detail & Related papers (2020-10-02T13:18:27Z) - Stepwise Model Selection for Sequence Prediction via Deep Kernel
Learning [100.83444258562263]
We propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting.
In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions.
We are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose.
arXiv Detail & Related papers (2020-01-12T09:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.