A Simple Zero-shot Prompt Weighting Technique to Improve Prompt
Ensembling in Text-Image Models
- URL: http://arxiv.org/abs/2302.06235v2
- Date: Sat, 15 Jul 2023 11:12:59 GMT
- Title: A Simple Zero-shot Prompt Weighting Technique to Improve Prompt
Ensembling in Text-Image Models
- Authors: James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Xiuye Gu, Yin
Cui, Dustin Tran, Jeremiah Zhe Liu, Balaji Lakshminarayanan
- Abstract summary: We aim to automate prompt engineering and improve zero-shot accuracy through prompt ensembling.
We identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data.
Using our proposed scoring method to create a weighted average prompt ensemble, our method outperforms equal average ensemble, as well as hand-crafted prompts.
- Score: 30.128204719490856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastively trained text-image models have the remarkable ability to
perform zero-shot classification, that is, classifying previously unseen images
into categories that the model has never been explicitly trained to identify.
However, these zero-shot classifiers need prompt engineering to achieve high
accuracy. Prompt engineering typically requires hand-crafting a set of prompts
for individual downstream tasks. In this work, we aim to automate this prompt
engineering and improve zero-shot accuracy through prompt ensembling. In
particular, we ask "Given a large pool of prompts, can we automatically score
the prompts and ensemble those that are most suitable for a particular
downstream dataset, without needing access to labeled validation data?". We
demonstrate that this is possible. In doing so, we identify several pathologies
in a naive prompt scoring method where the score can be easily overconfident
due to biases in pre-training and test data, and we propose a novel prompt
scoring method that corrects for the biases. Using our proposed scoring method
to create a weighted average prompt ensemble, our method outperforms equal
average ensemble, as well as hand-crafted prompts, on ImageNet, 4 of its
variants, and 11 fine-grained classification benchmarks, all while being fully
automatic, optimization-free, and not requiring access to labeled validation
data.
Related papers
- A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - Understanding prompt engineering may not require rethinking
generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature.
This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z) - Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts [7.208567411886273]
Recent studies have demonstrated that natural-language prompts can help to leverage the knowledge learned by pre-trained language models for the binary sentence-level sentiment classification task.
This study aims to find high-quality prompts for the given task in a zero-shot setting.
We empirically demonstrate that the top-ranked prompts are high-quality and significantly outperform the base prompt and the prompts generated using few-shot learning for the binary sentence-level sentiment classification task.
arXiv Detail & Related papers (2023-05-25T03:36:43Z) - Pre-trained Language Models Can be Fully Zero-Shot Learners [26.60008734311909]
We propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding.
NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning.
We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks.
arXiv Detail & Related papers (2022-12-14T00:03:52Z) - What Makes Pre-trained Language Models Better Zero-shot Learners? [12.164678440185007]
Current methods for prompt learning in zeroshot scenarios rely on a development set with sufficient human-annotated data.
We propose a simple yet effective method for screening reasonable prompt templates in zero-shot text classification: Perplexity Selection (Perplection)
Experiments show that our method leads to improved prediction performance in a realistic zero-shot setting, eliminating the need for any labelled examples.
arXiv Detail & Related papers (2022-09-30T03:28:19Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance.
Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency.
Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z) - Noisy Channel Language Model Prompting for Few-Shot Text Classification [87.23056864536613]
We introduce a noisy channel approach for language model prompting in few-shot text classification.
Instead of computing the likelihood of the label given the input, channel models compute the conditional probability of the input given the label.
We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters.
arXiv Detail & Related papers (2021-08-09T15:06:26Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.