Related papers: Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

URL: http://arxiv.org/abs/2307.11031v1
Date: Thu, 20 Jul 2023 17:07:28 GMT
Title: Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification
Authors: Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher R\'e
Abstract summary: We show that it is possible to improve prompt-based learning without additional labeled data. We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions. We find that Embroid substantially improves performance over original prompts.
Score: 20.85088711770188
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work asks whether it is possible to improve prompt-based learning without additional labeled data. We approach this problem by attempting to modify the predictions of a prompt, rather than the prompt itself. Our intuition is that accurate predictions should also be consistent: samples which are similar under some feature representation should receive the same prompt prediction. We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions. Embroid then uses these neighborhoods to create additional predictions for each sample, and combines these predictions with a simple latent variable graphical model in order to generate a final corrected prediction. In addition to providing a theoretical analysis of Embroid, we conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks. We find that (1) Embroid substantially improves performance over original prompts (e.g., by an average of 7.3 points on GPT-JT), (2) also realizes improvements for more sophisticated prompting strategies (e.g., chain-of-thought), and (3) can be specialized to domains like law through the embedding functions.

Related papers

Correct after Answer: Enhancing Multi-Span Question Answering with Post-Processing Method [11.794628063040108]
Multi-Span Question Answering (MSQA) requires models to extract one or multiple answer spans from a given context to answer a question. We propose Answering-Classifying-Correcting (ACC) framework, which employs a post-processing strategy to handle incorrect predictions.
arXiv Detail & Related papers (2024-10-22T08:04:32Z)
Ensemble Predicate Decoding for Unbiased Scene Graph Generation [40.01591739856469]
Scene Graph Generation (SGG) aims to generate a comprehensive graphical representation that captures semantic information of a given scenario. The model's performance in predicting more fine-grained predicates is hindered by a significant predicate bias. This paper proposes Ensemble Predicate Decoding (EPD), which employs multiple decoders to attain unbiased scene graph generation.
arXiv Detail & Related papers (2024-08-26T11:24:13Z)
Stochastic Online Conformal Prediction with Semi-Bandit Feedback [29.334511328067777]
We consider the online learning setting, where examples arrive over time, and the goal is to construct prediction sets dynamically. We propose a novel conformal prediction algorithm targeted at this setting, and prove that it obtains sublinear regret compared to the optimal conformal predictor.
arXiv Detail & Related papers (2024-05-22T00:42:49Z)
Less is More: One-shot Subgraph Reasoning on Large-scale Knowledge Graphs [49.547988001231424]
We propose the one-shot-subgraph link prediction to achieve efficient and adaptive prediction. Design principle is that, instead of directly acting on the whole KG, the prediction procedure is decoupled into two steps. We achieve promoted efficiency and leading performances on five large-scale benchmarks.
arXiv Detail & Related papers (2024-03-15T12:00:12Z)
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples. In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z)
The Unreliability of Explanations in Few-Shot In-Context Learning [50.77996380021221]
We focus on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. We show that explanations judged as good by humans--those that are logically consistent with the input--usually indicate more accurate predictions. We present a framework for calibrating model predictions based on the reliability of the explanations.
arXiv Detail & Related papers (2022-05-06T17:57:58Z)
Backward-Compatible Prediction Updates: A Probabilistic Approach [12.049279991559091]
We formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions. In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies for backward-compatible prediction updates.
arXiv Detail & Related papers (2021-07-02T13:05:31Z)
Calibrate Before Use: Improving Few-Shot Performance of Language Models [68.17016463756474]
GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples. We show that this type of few-shot learning can be unstable. The choice of prompt format, training examples, and even the order of the training examples can cause accuracy to vary from near chance to near state-of-the-art.
arXiv Detail & Related papers (2021-02-19T00:23:59Z)
Explain and Predict, and then Predict Again [6.865156063241553]
We propose ExPred, that uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses. We conduct an extensive evaluation of our approach on three diverse language datasets.
arXiv Detail & Related papers (2021-01-11T19:36:52Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.