Understanding prompt engineering may not require rethinking
generalization
- URL: http://arxiv.org/abs/2310.03957v1
- Date: Fri, 6 Oct 2023 00:52:48 GMT
- Title: Understanding prompt engineering may not require rethinking
generalization
- Authors: Victor Akinwande, Yiding Jiang, Dylan Sam, J. Zico Kolter
- Abstract summary: We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature.
This work provides a possible justification for the widespread practice of prompt engineering.
- Score: 56.38207873589642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-shot learning in prompted vision-language models, the practice of
crafting prompts to build classifiers without an explicit training process, has
achieved impressive performance in many settings. This success presents a
seemingly surprising observation: these methods suffer relatively little from
overfitting, i.e., when a prompt is manually engineered to achieve low error on
a given training set (thus rendering the method no longer actually zero-shot),
the approach still performs well on held-out test data. In this paper, we show
that we can explain such performance well via recourse to classical PAC-Bayes
bounds. Specifically, we show that the discrete nature of prompts, combined
with a PAC-Bayes prior given by a language model, results in generalization
bounds that are remarkably tight by the standards of the literature: for
instance, the generalization bound of an ImageNet classifier is often within a
few percentage points of the true test error. We demonstrate empirically that
this holds for existing handcrafted prompts and prompts generated through
simple greedy search. Furthermore, the resulting bound is well-suited for model
selection: the models with the best bound typically also have the best test
performance. This work thus provides a possible justification for the
widespread practice of prompt engineering, even if it seems that such methods
could potentially overfit the training data.
Related papers
- Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model [18.111868378615206]
We propose a pairwise few-shot ranker that achieves a close performance to that of a supervised model without requiring any complex training pipeline.
Our method also achieves a close performance to that of a supervised model without requiring any complex training pipeline.
arXiv Detail & Related papers (2024-09-26T11:19:09Z) - Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning [50.26965628047682]
Adapting pre-trained models to open classes is a challenging problem in machine learning.
In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach.
Our proposed method outperforms all comparison methods on average considering both base and new classes.
arXiv Detail & Related papers (2024-08-29T12:34:01Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Impossibility Theorems for Feature Attribution [21.88229793890961]
We show that for moderately rich model classes, any feature attribution method can provably fail to improve on random guessing for inferring model behaviour.
Our results apply to common end-tasks such as characterizing local model behaviour, identifying spurious features, and algorithmic recourse.
arXiv Detail & Related papers (2022-12-22T17:03:57Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Label-Descriptive Patterns and their Application to Characterizing
Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless.
Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model.
In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z) - Reordering Examples Helps during Priming-based Few-Shot Learning [6.579039107070663]
We show that PERO can learn to generalize efficiently using as few as 10 examples.
We demonstrate the effectiveness of the proposed method on the tasks of sentiment classification, natural language inference and fact retrieval.
arXiv Detail & Related papers (2021-06-03T11:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.