Related papers: Noisy Channel Language Model Prompting for Few-Shot Text Classification

Noisy Channel Language Model Prompting for Few-Shot Text Classification

URL: http://arxiv.org/abs/2108.04106v1
Date: Mon, 9 Aug 2021 15:06:26 GMT
Title: Noisy Channel Language Model Prompting for Few-Shot Text Classification
Authors: Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
Abstract summary: We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input, channel models compute the conditional probability of the input given the label. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters.
Score: 87.23056864536613
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input (referred as direct models), channel models compute the conditional probability of the input given the label, and are thereby required to explain every word in the input. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters, via either in-context demonstration or prompt tuning. Our experiments show that, for both methods, channel models significantly outperform their direct counterparts, which we attribute to their stability, i.e., lower variance and higher worst-case accuracy. We also present extensive ablations that provide recommendations for when to use channel prompt tuning instead of other competitive models (e.g., direct head tuning): channel prompt tuning is preferred when the number of training examples is small, labels in the training data are imbalanced, or generalization to unseen labels is required.

Related papers

Early Stopping Against Label Noise Without Validation Data [54.27621957395026]
We propose a novel early stopping method called Label Wave, which does not require validation data for selecting the desired model. We show both the effectiveness of the Label Wave method across various settings and its capability to enhance the performance of existing methods for learning with noisy labels.
arXiv Detail & Related papers (2025-02-11T13:40:15Z)
Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model [0.0]
OpenAI's Whisper Automated Speech Recognition model excels in generalizing across diverse datasets and domains. We propose a method to enhance transcription accuracy without explicit fine-tuning or altering model parameters.
arXiv Detail & Related papers (2024-10-24T01:58:11Z)
Clarify: Improving Model Robustness With Natural Language Corrections [59.041682704894555]
The standard way to teach models is by feeding them lots of data. This approach often teaches models incorrect ideas because they pick up on misleading signals in the data. We propose Clarify, a novel interface and method for interactively correcting model misconceptions.
arXiv Detail & Related papers (2024-02-06T05:11:38Z)
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models [48.77653835765705]
We introduce a probabilistic resolution to prompt tuning, where the label-specific prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts.
arXiv Detail & Related papers (2023-03-16T06:09:15Z)
Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem. Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts. We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z)
Language Models in the Loop: Incorporating Prompting into Weak Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited. Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z)
Eliciting Knowledge from Pretrained Language Models for Prototypical Prompt Verbalizer [12.596033546002321]
In this paper, we focus on eliciting knowledge from pretrained language models and propose a prototypical prompt verbalizer for prompt-tuning. For zero-shot settings, knowledge is elicited from pretrained language models by a manually designed template to form initial prototypical embeddings. For few-shot settings, models are tuned to learn meaningful and interpretable prototypical embeddings.
arXiv Detail & Related papers (2022-01-14T12:04:37Z)
Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal. In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective. Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z)
Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling [35.43382144290393]
We introduce efficient approximations to make inference with the noisy channel approach as fast as strong ensembles. We also show that the noisy channel approach can outperform strong pre-training results by achieving a new state of the art on WMT Romanian-English translation.
arXiv Detail & Related papers (2020-11-13T23:22:28Z)
Deep k-NN for Noisy Labels [55.97221021252733]
We show that a simple $k$-nearest neighbor-based filtering approach on the logit layer of a preliminary model can remove mislabeled data and produce more accurate models than many recently proposed methods.
arXiv Detail & Related papers (2020-04-26T05:15:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.