Noisy Channel Language Model Prompting for Few-Shot Text Classification
- URL: http://arxiv.org/abs/2108.04106v1
- Date: Mon, 9 Aug 2021 15:06:26 GMT
- Title: Noisy Channel Language Model Prompting for Few-Shot Text Classification
- Authors: Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
- Abstract summary: We introduce a noisy channel approach for language model prompting in few-shot text classification.
Instead of computing the likelihood of the label given the input, channel models compute the conditional probability of the input given the label.
We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters.
- Score: 87.23056864536613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a noisy channel approach for language model prompting in
few-shot text classification. Instead of computing the likelihood of the label
given the input (referred as direct models), channel models compute the
conditional probability of the input given the label, and are thereby required
to explain every word in the input. We use channel models for recently proposed
few-shot learning methods with no or very limited updates to the language model
parameters, via either in-context demonstration or prompt tuning. Our
experiments show that, for both methods, channel models significantly
outperform their direct counterparts, which we attribute to their stability,
i.e., lower variance and higher worst-case accuracy. We also present extensive
ablations that provide recommendations for when to use channel prompt tuning
instead of other competitive models (e.g., direct head tuning): channel prompt
tuning is preferred when the number of training examples is small, labels in
the training data are imbalanced, or generalization to unseen labels is
required.
Related papers
- Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model [0.0]
OpenAI's Whisper Automated Speech Recognition model excels in generalizing across diverse datasets and domains.
We propose a method to enhance transcription accuracy without explicit fine-tuning or altering model parameters.
arXiv Detail & Related papers (2024-10-24T01:58:11Z) - Clarify: Improving Model Robustness With Natural Language Corrections [59.041682704894555]
The standard way to teach models is by feeding them lots of data.
This approach often teaches models incorrect ideas because they pick up on misleading signals in the data.
We propose Clarify, a novel interface and method for interactively correcting model misconceptions.
arXiv Detail & Related papers (2024-02-06T05:11:38Z) - Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models [48.77653835765705]
We introduce a probabilistic resolution to prompt tuning, where the label-specific prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model.
We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts.
arXiv Detail & Related papers (2023-03-16T06:09:15Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - Eliciting Knowledge from Pretrained Language Models for Prototypical
Prompt Verbalizer [12.596033546002321]
In this paper, we focus on eliciting knowledge from pretrained language models and propose a prototypical prompt verbalizer for prompt-tuning.
For zero-shot settings, knowledge is elicited from pretrained language models by a manually designed template to form initial prototypical embeddings.
For few-shot settings, models are tuned to learn meaningful and interpretable prototypical embeddings.
arXiv Detail & Related papers (2022-01-14T12:04:37Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - Language Models not just for Pre-training: Fast Online Neural Noisy
Channel Modeling [35.43382144290393]
We introduce efficient approximations to make inference with the noisy channel approach as fast as strong ensembles.
We also show that the noisy channel approach can outperform strong pre-training results by achieving a new state of the art on WMT Romanian-English translation.
arXiv Detail & Related papers (2020-11-13T23:22:28Z) - Deep k-NN for Noisy Labels [55.97221021252733]
We show that a simple $k$-nearest neighbor-based filtering approach on the logit layer of a preliminary model can remove mislabeled data and produce more accurate models than many recently proposed methods.
arXiv Detail & Related papers (2020-04-26T05:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.