Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning
- URL: http://arxiv.org/abs/2004.14074v1
- Date: Wed, 29 Apr 2020 10:54:40 GMT
- Title: Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning
- Authors: Alexandre Tamborrino, Nicola Pellicano, Baptiste Pannier, Pascal
Voitot and Louise Naudin
- Abstract summary: Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
- Score: 61.32992639292889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning of pre-trained transformer models has become the standard
approach for solving common NLP tasks. Most of the existing approaches rely on
a randomly initialized classifier on top of such networks. We argue that this
fine-tuning procedure is sub-optimal as the pre-trained model has no prior on
the specific classifier labels, while it might have already learned an
intrinsic textual representation of the task. In this paper, we introduce a new
scoring method that casts a plausibility ranking task in a full-text format and
leverages the masked language modeling head tuned during the pre-training
phase. We study commonsense reasoning tasks where the model must rank a set of
hypotheses given a premise, focusing on the COPA, Swag, HellaSwag and
CommonsenseQA datasets. By exploiting our scoring method without fine-tuning,
we are able to produce strong baselines (e.g. 80% test accuracy on COPA) that
are comparable to supervised approaches. Moreover, when fine-tuning directly on
the proposed scoring function, we show that our method provides a much more
stable training phase across random restarts (e.g $\times 10$ standard
deviation reduction on COPA test accuracy) and requires less annotated data
than the standard classifier approach to reach equivalent performances.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Debiased Fine-Tuning for Vision-language Models by Prompt Regularization [50.41984119504716]
We present a new paradigm for fine-tuning large-scale vision pre-trained models on downstream task, dubbed Prompt Regularization (ProReg)
ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning.
We show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T11:53:55Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - Bayesian Few-Shot Classification with One-vs-Each P\'olya-Gamma
Augmented Gaussian Processes [7.6146285961466]
Few-shot classification (FSC) is an important step on the path toward human-like machine learning.
We propose a novel combination of P'olya-Gamma augmentation and the one-vs-each softmax approximation that allows us to efficiently marginalize over functions rather than model parameters.
We demonstrate improved accuracy and uncertainty quantification on both standard few-shot classification benchmarks and few-shot domain transfer tasks.
arXiv Detail & Related papers (2020-07-20T19:10:41Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.