Related papers: Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis

Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis

URL: http://arxiv.org/abs/2305.14877v2
Date: Fri, 8 Mar 2024 18:51:03 GMT
Title: Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis
Authors: Sohee Yang, Jonghyeon Kim, Joel Jang, Seonghyeon Ye, Hyunji Lee, Minjoon Seo
Abstract summary: We propose a unified framework to interpret and evaluate the existing probability-based prompt selection methods. We find that each of the existing methods can be interpreted as some variant of the method that maximizes mutual information between the input and the predicted output (MI) We propose a novel calibration method called by Marginalization (CBM) that is to the existing methods and helps increase the prompt selection effectiveness of the best method to 96.85%, achieving 99.44% of the oracle prompt F1 without calibration.
Score: 52.04932081106623
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Previous works in prompt engineering for large language models have introduced different gradient-free probability-based prompt selection methods that aim to choose the optimal prompt among the candidates for a given task but have failed to provide a comprehensive and fair comparison between each other. In this paper, we propose a unified framework to interpret and evaluate the existing probability-based prompt selection methods by performing extensive experiments on 13 common and diverse NLP tasks. We find that each of the existing methods can be interpreted as some variant of the method that maximizes mutual information between the input and the predicted output (MI). Utilizing this finding, we develop several other combinatorial variants of MI and increase the effectiveness of the oracle prompt selection method from 87.79% to 94.98%, measured as the ratio of the performance of the selected prompt to that of the optimal oracle prompt. Furthermore, considering that all the methods rely on the output probability distribution of the model that might be biased, we propose a novel calibration method called Calibration by Marginalization (CBM) that is orthogonal to the existing methods and helps increase the prompt selection effectiveness of the best method to 96.85%, achieving 99.44% of the oracle prompt F1 without calibration.

Related papers

A Principled Approach to Randomized Selection under Uncertainty: Applications to Peer Review and Grant Funding [68.43987626137512]
We propose a principled framework for randomized decision-making based on interval estimates of the quality of each item.<n>We introduce MERIT, an optimization-based method that maximizes the worst-case expected number of top candidates selected.<n>We prove that MERIT satisfies desirable axiomatic properties not guaranteed by existing approaches.
arXiv Detail & Related papers (2025-06-23T19:59:30Z)
Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection [52.716143424856185]
We propose LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Our method also outperforms the greedy search in attribution efficiency, being 1.6 times faster.
arXiv Detail & Related papers (2025-04-01T06:58:15Z)
Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency [66.96286531087549]
Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompass a variety of approaches.<n>We propose a novel approach to integrating model confidence with output consistency, resulting in a family of efficient and robust UQ methods.<n>We evaluate our approach across various tasks such as question answering, abstractive summarization, and machine translation.
arXiv Detail & Related papers (2025-02-07T14:30:12Z)
Hyperband-based Bayesian Optimization for Black-box Prompt Selection [15.756224286651237]
Black-box prompt selection is challenging due to potentially large, search spaces, absence of gradient information, and high evaluation cost of prompts on a validation set.<n>We propose HbBoPs, a novel method that combines a structural-aware deep kernel Gaussian Process with Hyperband as a multi-fidelity scheduler.<n>HbBoPs outperforms state-of-the-art methods in both performance and efficiency.
arXiv Detail & Related papers (2024-12-10T14:42:51Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step. Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z)
On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions. We propose a novel method to address this challenge. We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z)
Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference [50.95521705711802]
Previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model. This paper formally formulates the neighborhood effect as an interference problem from the perspective of causal inference. We propose a novel ideal loss that can be used to deal with selection bias in the presence of neighborhood effect.
arXiv Detail & Related papers (2024-04-30T15:20:41Z)
A multi-criteria approach for selecting an explanation from the set of counterfactuals produced by an ensemble of explainers [4.239829789304117]
We propose to use a multi-stage ensemble approach that will select single counterfactual based on the multiple-criteria analysis. The proposed approach generates fully actionable counterfactuals with attractive compromise values of the considered quality measures.
arXiv Detail & Related papers (2024-03-20T19:25:11Z)
Learning Fair Policies for Multi-stage Selection Problems from Observational Data [4.282745020665833]
We consider the problem of learning fair policies for multi-stage selection problems from observational data. This problem arises in several high-stakes domains such as company hiring, loan approval, or bail decisions where outcomes are only observed for those selected. We propose a multi-stage framework that can be augmented with various fairness constraints, such as demographic parity or equal opportunity.
arXiv Detail & Related papers (2023-12-20T16:33:15Z)
Exploring Lottery Prompts for Pre-trained Language Models [46.66885465183664]
We explore the instance-level prompt and their generalizability. We find that for every instance, there is almost always a lottery prompt that induces the correct prediction from the PLM. Some strong lottery prompts have high performance over the whole training set.
arXiv Detail & Related papers (2023-05-31T02:17:04Z)
Bi-objective Ranking and Selection Using Stochastic Kriging [0.0]
We consider bi-objective ranking and selection problems in which the two objective outcomes have been observed with uncertainty. We propose a novel Bayesian bi-objective ranking and selection method that sequentially allocates extra samples to competitive solutions. Experimental results show that the proposed method outperforms the standard allocation method, as well as a well-known state-of-the-art algorithm.
arXiv Detail & Related papers (2022-09-05T23:51:07Z)
Lookahead and Hybrid Sample Allocation Procedures for Multiple Attribute Selection Decisions [0.9137554315375922]
This paper considers settings in which each measurement yields one sample of one attribute for one alternative. When given a fixed number of samples to collect, the decision-maker must determine which samples to obtain, make the measurements, update prior beliefs about the attribute magnitudes, and then select an alternative.
arXiv Detail & Related papers (2020-07-31T15:04:49Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.