Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency
- URL: http://arxiv.org/abs/2305.10713v2
- Date: Mon, 23 Oct 2023 01:22:44 GMT
- Title: Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency
- Authors: Lingfeng Shen, Weiting Tan, Boyuan Zheng, Daniel Khashabi
- Abstract summary: We introduce prompt flatness, a new metric to quantify the expected utility of a language prompt.
We show that combining prompt flatness with existing metrics improves both performance and sample efficiency.
- Score: 26.829610705207955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With growing capabilities of large language models, prompting them has become
the dominant way to access them. This has motivated the development of
strategies for automatically selecting effective language prompts. In this
paper, we introduce prompt flatness, a new metric to quantify the expected
utility of a language prompt. This metric is inspired by flatness
regularization in statistical learning that quantifies the robustness of the
model towards its parameter perturbations. We provide theoretical foundations
for this metric and its relationship with other prompt selection metrics,
providing a comprehensive understanding of existing methods. Empirically, we
show that combining prompt flatness with existing metrics improves both
performance and sample efficiency. Our metric outperforms the previous prompt
selection metrics with an average increase of 5% in accuracy and 10% in Pearson
correlation across 6 classification benchmarks.
Related papers
- FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation [73.454943870226]
Language models have shown impressive in-context-learning capabilities.
We propose a measure called FamiCom, providing a more comprehensive measure for task-agnostic performance estimation.
arXiv Detail & Related papers (2024-06-17T06:14:55Z) - Evaluation of Faithfulness Using the Longest Supported Subsequence [52.27522262537075]
We introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous of the claim that is supported by the context.
Using a new human-annotated dataset, we finetune a model to generate Longest Supported Subsequence (LSS)
Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.
arXiv Detail & Related papers (2023-08-23T14:18:44Z) - ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning [63.77667876176978]
Large language models show improved downstream task interpretability when prompted to generate step-by-step reasoning to justify their final answers.
These reasoning steps greatly improve model interpretability and verification, but objectively studying their correctness is difficult.
We present ROS, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics.
arXiv Detail & Related papers (2022-12-15T15:52:39Z) - Classification Performance Metric Elicitation and its Applications [5.5637552942511155]
Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications.
This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences.
arXiv Detail & Related papers (2022-08-19T03:57:17Z) - Evaluating natural language processing models with generalization
metrics that do not need access to any training or testing data [66.11139091362078]
We provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics.
Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks.
arXiv Detail & Related papers (2022-02-06T20:07:35Z) - Query-augmented Active Metric Learning [3.871148938060281]
We propose an active metric learning method for clustering with pairwise constraints.
We augment the queried constraints by generating more pairwise labels to provide additional information in learning a metric.
We increase the robustness of metric learning by updating the learned metric sequentially and penalizing the irrelevant features adaptively.
arXiv Detail & Related papers (2021-11-08T23:32:13Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.