Improving Probability-based Prompt Selection Through Unified Evaluation
and Analysis
- URL: http://arxiv.org/abs/2305.14877v2
- Date: Fri, 8 Mar 2024 18:51:03 GMT
- Title: Improving Probability-based Prompt Selection Through Unified Evaluation
and Analysis
- Authors: Sohee Yang, Jonghyeon Kim, Joel Jang, Seonghyeon Ye, Hyunji Lee,
Minjoon Seo
- Abstract summary: We propose a unified framework to interpret and evaluate the existing probability-based prompt selection methods.
We find that each of the existing methods can be interpreted as some variant of the method that maximizes mutual information between the input and the predicted output (MI)
We propose a novel calibration method called by Marginalization (CBM) that is to the existing methods and helps increase the prompt selection effectiveness of the best method to 96.85%, achieving 99.44% of the oracle prompt F1 without calibration.
- Score: 52.04932081106623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous works in prompt engineering for large language models have
introduced different gradient-free probability-based prompt selection methods
that aim to choose the optimal prompt among the candidates for a given task but
have failed to provide a comprehensive and fair comparison between each other.
In this paper, we propose a unified framework to interpret and evaluate the
existing probability-based prompt selection methods by performing extensive
experiments on 13 common and diverse NLP tasks. We find that each of the
existing methods can be interpreted as some variant of the method that
maximizes mutual information between the input and the predicted output (MI).
Utilizing this finding, we develop several other combinatorial variants of MI
and increase the effectiveness of the oracle prompt selection method from
87.79% to 94.98%, measured as the ratio of the performance of the selected
prompt to that of the optimal oracle prompt. Furthermore, considering that all
the methods rely on the output probability distribution of the model that might
be biased, we propose a novel calibration method called Calibration by
Marginalization (CBM) that is orthogonal to the existing methods and helps
increase the prompt selection effectiveness of the best method to 96.85%,
achieving 99.44% of the oracle prompt F1 without calibration.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step.
Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z) - On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions.
We propose a novel method to address this challenge.
We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z) - Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference [50.95521705711802]
Previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model.
This paper formally formulates the neighborhood effect as an interference problem from the perspective of causal inference.
We propose a novel ideal loss that can be used to deal with selection bias in the presence of neighborhood effect.
arXiv Detail & Related papers (2024-04-30T15:20:41Z) - A multi-criteria approach for selecting an explanation from the set of counterfactuals produced by an ensemble of explainers [4.239829789304117]
We propose to use a multi-stage ensemble approach that will select single counterfactual based on the multiple-criteria analysis.
The proposed approach generates fully actionable counterfactuals with attractive compromise values of the considered quality measures.
arXiv Detail & Related papers (2024-03-20T19:25:11Z) - Exploring Lottery Prompts for Pre-trained Language Models [46.66885465183664]
We explore the instance-level prompt and their generalizability.
We find that for every instance, there is almost always a lottery prompt that induces the correct prediction from the PLM.
Some strong lottery prompts have high performance over the whole training set.
arXiv Detail & Related papers (2023-05-31T02:17:04Z) - Bi-objective Ranking and Selection Using Stochastic Kriging [0.0]
We consider bi-objective ranking and selection problems in which the two objective outcomes have been observed with uncertainty.
We propose a novel Bayesian bi-objective ranking and selection method that sequentially allocates extra samples to competitive solutions.
Experimental results show that the proposed method outperforms the standard allocation method, as well as a well-known state-of-the-art algorithm.
arXiv Detail & Related papers (2022-09-05T23:51:07Z) - Lookahead and Hybrid Sample Allocation Procedures for Multiple Attribute
Selection Decisions [0.9137554315375922]
This paper considers settings in which each measurement yields one sample of one attribute for one alternative.
When given a fixed number of samples to collect, the decision-maker must determine which samples to obtain, make the measurements, update prior beliefs about the attribute magnitudes, and then select an alternative.
arXiv Detail & Related papers (2020-07-31T15:04:49Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.