Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
- URL: http://arxiv.org/abs/2109.04144v1
- Date: Thu, 9 Sep 2021 10:10:29 GMT
- Title: Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning
- Authors: Prasetya Ajie Utama, Nafise Sadat Moosavi, Victor Sanh, Iryna Gurevych
- Abstract summary: We show that prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inferences based on lexical overlap.
We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning.
- Score: 57.4036085386653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent prompt-based approaches allow pretrained language models to achieve
strong performances on few-shot finetuning by reformulating downstream tasks as
a language modeling problem. In this work, we demonstrate that, despite its
advantages on low data regimes, finetuned prompt-based models for sentence pair
classification tasks still suffer from a common pitfall of adopting inference
heuristics based on lexical overlap, e.g., models incorrectly assuming a
sentence pair is of the same meaning because they consist of the same set of
words. Interestingly, we find that this particular inference heuristic is
significantly less present in the zero-shot evaluation of the prompt-based
model, indicating how finetuning can be destructive to useful knowledge learned
during the pretraining. We then show that adding a regularization that
preserves pretraining weights is effective in mitigating this destructive
tendency of few-shot finetuning. Our evaluation on three datasets demonstrates
promising improvements on the three corresponding challenge datasets used to
diagnose the inference heuristics.
Related papers
- Semi-supervised Learning For Robust Speech Evaluation [30.593420641501968]
Speech evaluation measures a learners oral proficiency using automatic models.
This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization.
An anchor model is trained using pseudo labels to predict the correctness of pronunciation.
arXiv Detail & Related papers (2024-09-23T02:11:24Z) - Forget-me-not! Contrastive Critics for Mitigating Posterior Collapse [20.258298183228824]
We introduce inference critics that detect and incentivize against posterior collapse by requiring correspondence between latent variables and the observations.
This approach is straightforward to implement and requires significantly less training time than prior methods.
arXiv Detail & Related papers (2022-07-19T20:07:17Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - A Mutually Reinforced Framework for Pretrained Sentence Embeddings [49.297766436632685]
InfoCSE is a novel framework for learning high-quality sentence embeddings.
It exploits the sentence representation model itself and realizes the following iterative self-supervision process.
In other words, the representation learning and data annotation become mutually reinforced, where a strong self-supervision effect can be derived.
arXiv Detail & Related papers (2022-02-28T14:00:16Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Layer-wise Analysis of a Self-supervised Speech Representation Model [26.727775920272205]
Self-supervised learning approaches have been successful for pre-training speech representation models.
Not much has been studied about the type or extent of information encoded in the pre-trained representations themselves.
arXiv Detail & Related papers (2021-07-10T02:13:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.