Related papers: A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

URL: http://arxiv.org/abs/2302.06235v2
Date: Sat, 15 Jul 2023 11:12:59 GMT
Title: A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Authors: James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Xiuye Gu, Yin Cui, Dustin Tran, Jeremiah Zhe Liu, Balaji Lakshminarayanan
Abstract summary: We aim to automate prompt engineering and improve zero-shot accuracy through prompt ensembling. We identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data. Using our proposed scoring method to create a weighted average prompt ensemble, our method outperforms equal average ensemble, as well as hand-crafted prompts.
Score: 30.128204719490856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contrastively trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot classifiers need prompt engineering to achieve high accuracy. Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?". We demonstrate that this is possible. In doing so, we identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data, and we propose a novel prompt scoring method that corrects for the biases. Using our proposed scoring method to create a weighted average prompt ensemble, our method outperforms equal average ensemble, as well as hand-crafted prompts, on ImageNet, 4 of its variants, and 11 fine-grained classification benchmarks, all while being fully automatic, optimization-free, and not requiring access to labeled validation data.

Related papers

A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text. Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z)
Understanding prompt engineering may not require rethinking generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature. This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z)
Zero-shot Approach to Overcome Perturbation Sensitivity of Prompts [7.208567411886273]
Recent studies have demonstrated that natural-language prompts can help to leverage the knowledge learned by pre-trained language models for the binary sentence-level sentiment classification task. This study aims to find high-quality prompts for the given task in a zero-shot setting. We empirically demonstrate that the top-ranked prompts are high-quality and significantly outperform the base prompt and the prompts generated using few-shot learning for the binary sentence-level sentiment classification task.
arXiv Detail & Related papers (2023-05-25T03:36:43Z)
Pre-trained Language Models Can be Fully Zero-Shot Learners [26.60008734311909]
We propose nonparametric prompting PLM (NPPrompt) for fully zero-shot language understanding. NPPrompt uses only pre-trained language models and does not require any labeled data or additional raw corpus for further fine-tuning. We evaluate NPPrompt against previous major few-shot and zero-shot learning methods on diverse NLP tasks.
arXiv Detail & Related papers (2022-12-14T00:03:52Z)
What Makes Pre-trained Language Models Better Zero-shot Learners? [12.164678440185007]
Current methods for prompt learning in zeroshot scenarios rely on a development set with sufficient human-annotated data. We propose a simple yet effective method for screening reasonable prompt templates in zero-shot text classification: Perplexity Selection (Perplection) Experiments show that our method leads to improved prediction performance in a realistic zero-shot setting, eliminating the need for any labelled examples.
arXiv Detail & Related papers (2022-09-30T03:28:19Z)
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample. TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average. In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z)
Language Models in the Loop: Incorporating Prompting into Weak Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z)
Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance. Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency. Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z)
Noisy Channel Language Model Prompting for Few-Shot Text Classification [87.23056864536613]
We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input, channel models compute the conditional probability of the input given the label. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters.
arXiv Detail & Related papers (2021-08-09T15:06:26Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.