Related papers: Hyperband-based Bayesian Optimization for Black-box Prompt Selection

Hyperband-based Bayesian Optimization for Black-box Prompt Selection

URL: http://arxiv.org/abs/2412.07820v2
Date: Tue, 03 Jun 2025 09:59:18 GMT
Title: Hyperband-based Bayesian Optimization for Black-box Prompt Selection
Authors: Lennart Schneider, Martin Wistuba, Aaron Klein, Jacek Golebiowski, Giovanni Zappella, Felice Antonio Merra,
Abstract summary: Black-box prompt selection is challenging due to potentially large, search spaces, absence of gradient information, and high evaluation cost of prompts on a validation set.<n>We propose HbBoPs, a novel method that combines a structural-aware deep kernel Gaussian Process with Hyperband as a multi-fidelity scheduler.<n>HbBoPs outperforms state-of-the-art methods in both performance and efficiency.
Score: 15.756224286651237
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimal prompt selection is crucial for maximizing large language model (LLM) performance on downstream tasks, especially in black-box settings where models are only accessible via APIs. Black-box prompt selection is challenging due to potentially large, combinatorial search spaces, absence of gradient information, and high evaluation cost of prompts on a validation set. We propose HbBoPs, a novel method that combines a structural-aware deep kernel Gaussian Process with Hyperband as a multi-fidelity scheduler to efficiently select prompts. HbBoPs uses embeddings of instructions and few-shot exemplars, treating them as modular components within prompts. This enhances the surrogate model's ability to predict which prompt to evaluate next in a sample-efficient manner. Hyperband improves query-efficiency by adaptively allocating resources across different fidelity levels, reducing the number of validation instances required for evaluating prompts. Extensive experiments across ten diverse benchmarks and three LLMs demonstrate that HbBoPs outperforms state-of-the-art methods in both performance and efficiency.

Related papers

Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [62.579951798437115]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z)
ORPP: Self-Optimizing Role-playing Prompts to Enhance Language Model Capabilities [64.24517317344959]
High-quality prompts are crucial for eliciting outstanding performance from large language models on complex tasks.<n>We propose ORPP, a framework that enhances model performance by optimizing and generating role-playing prompts.<n>We show that ORPP not only matches but in most cases surpasses existing mainstream prompt optimization methods in terms of performance.
arXiv Detail & Related papers (2025-06-03T05:51:35Z)
ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models [14.137615267026755]
We propose Zeroth-order Intrinsic-dimensional Prompt-tuning, which enables efficient and robust prompt optimization in a purely black-box setting. We evaluate ZIP on 13+ vision-language tasks in standard benchmarks and show that it achieves an average improvement of approximately 6% in few-shot accuracy and 48% in query efficiency.
arXiv Detail & Related papers (2025-04-09T12:56:22Z)
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization [30.748085697067154]
We propose a Multi-Agent framework incorporating Socratic guidance (MARS) MARS comprises seven agents, each with distinct functionalities, which autonomously use the Planner to devise an optimization path. We conduct extensive experiments on various datasets to validate the effectiveness of our method.
arXiv Detail & Related papers (2025-03-21T06:19:55Z)
Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting [0.8238423959893132]
"Auto-Demo Prompting" is a novel approach that leverages the question-output pairs from earlier questions within a batch as demonstrations for subsequent answer inference. Our method effectively bridges the gap between batch prompting and few-shot prompting, enhancing performance with only a slight compromise in token usage.
arXiv Detail & Related papers (2024-10-02T16:34:40Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
Large Language Models Prompting With Episodic Memory [53.8690170372303]
We propose PrOmpting with Episodic Memory (POEM), a novel prompt optimization technique that is simple, efficient, and demonstrates strong generalization capabilities. In the testing phase, we optimize the sequence of examples for each test query by selecting the sequence that yields the highest total rewards from the top-k most similar training examples in the episodic memory. Our results show that POEM outperforms recent techniques like TEMPERA and RLPrompt by over 5.3% in various text classification tasks.
arXiv Detail & Related papers (2024-08-14T11:19:28Z)
Prompt Tuning as User Inherent Profile Inference Machine [53.78398656789463]
We propose UserIP-Tuning, which uses prompt-tuning to infer user profiles. A profile quantization codebook bridges the modality gap by profile embeddings into collaborative IDs. Experiments on four public datasets show that UserIP-Tuning outperforms state-of-the-art recommendation algorithms.
arXiv Detail & Related papers (2024-08-13T02:25:46Z)
Improving Minimum Bayes Risk Decoding with Multi-Prompt [10.401677244785166]
We propose multi-prompt decoding, where many candidate generations are decoded from a prompt bank at inference-time. To ensemble candidates, we use Minimum Bayes Risk (MBR) decoding, which selects a final output using a trained value metric.
arXiv Detail & Related papers (2024-07-22T02:57:10Z)
On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions. We propose a novel method to address this challenge. We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
Efficient Prompt Optimization Through the Lens of Best Arm Identification [50.56113809171805]
This work provides a principled framework, TRIPLE, to efficiently perform prompt selection under an explicit budget constraint. It is built on a novel connection established between prompt optimization and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB)
arXiv Detail & Related papers (2024-02-15T05:31:13Z)
A Bayesian approach for prompt optimization in pre-trained language models [1.980639720136382]
In this paper we focus on hard prompt tuning (HPT) which directly searches for discrete tokens to be added to the text input with-out access to the large language model (LLM) In this paper we use BoTorch, a library for Bayesian optimization research built on top of pyTorch.
arXiv Detail & Related papers (2023-12-01T10:10:18Z)
PromptBoosting: Black-Box Text Classification with Ten Forward Passes [61.38341243907045]
We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM's parameters, gradients, or hidden representations. Experiments show that PromptBoosting achieves state-of-the-art performance in multiple black-box few-shot classification tasks, and matches or outperforms full fine-tuning in both few-shot and standard learning paradigms, while training 10x faster than existing black-box methods.
arXiv Detail & Related papers (2022-12-19T06:04:54Z)
RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL) RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs) Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.