Embarrassingly Simple Performance Prediction for Abductive Natural
Language Inference
- URL: http://arxiv.org/abs/2202.10408v1
- Date: Mon, 21 Feb 2022 18:10:24 GMT
- Title: Embarrassingly Simple Performance Prediction for Abductive Natural
Language Inference
- Authors: Em\=ils Kadi\c{k}is and Vaibhav Srivastav and Roman Klinger
- Abstract summary: We propose a method for predicting the performance of NLI models without fine-tuning them.
We show that the accuracy of the cosine similarity approach correlates strongly with the accuracy of the classification approach with a Pearson correlation coefficient of 0.65.
Our method can lead to significant time savings in the process of model selection.
- Score: 10.536415845097661
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of abductive natural language inference (\alpha{}nli), to decide
which hypothesis is the more likely explanation for a set of observations, is a
particularly difficult type of NLI. Instead of just determining a causal
relationship, it requires common sense to also evaluate how reasonable an
explanation is. All recent competitive systems build on top of contextualized
representations and make use of transformer architectures for learning an NLI
model. When somebody is faced with a particular NLI task, they need to select
the best model that is available. This is a time-consuming and resource-intense
endeavour. To solve this practical problem, we propose a simple method for
predicting the performance without actually fine-tuning the model. We do this
by testing how well the pre-trained models perform on the \alpha{}nli task when
just comparing sentence embeddings with cosine similarity to what the
performance that is achieved when training a classifier on top of these
embeddings. We show that the accuracy of the cosine similarity approach
correlates strongly with the accuracy of the classification approach with a
Pearson correlation coefficient of 0.65. Since the similarity computation is
orders of magnitude faster to compute on a given dataset (less than a minute
vs. hours), our method can lead to significant time savings in the process of
model selection.
Related papers
- A Lightweight Measure of Classification Difficulty from Application Dataset Characteristics [4.220363193932374]
We propose an efficient cosine similarity-based classification difficulty measure S.
It is calculated from the number of classes and intra- and inter-class similarity metrics of the dataset.
We show how a practitioner can use this measure to help select an efficient model 6 to 29x faster than through repeated training and testing.
arXiv Detail & Related papers (2024-04-09T03:27:09Z) - Efficient Prompt Caching via Embedding Similarity [26.456212783693545]
We focus on the prediction accuracy of prompt caching for single-round question-answering tasks via embedding similarity.
We propose a distillation-based method to fine-tune the existing embeddings for better better prediction.
We also conduct simulations demonstrating that our trained models achieve better caching efficiency than the previous embedding model.
arXiv Detail & Related papers (2024-02-02T06:34:11Z) - Enhancing Self-Consistency and Performance of Pre-Trained Language
Models through Natural Language Inference [72.61732440246954]
Large pre-trained language models often lack logical consistency across test inputs.
We propose a framework, ConCoRD, for boosting the consistency and accuracy of pre-trained NLP models.
We show that ConCoRD consistently boosts accuracy and consistency of off-the-shelf closed-book QA and VQA models.
arXiv Detail & Related papers (2022-11-21T21:58:30Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Monotonicity in practice of adaptive testing [0.0]
This article evaluates Bayesian network models used for computerized adaptive testing and learned with a recently proposed monotonicity gradient algorithm.
The quality of methods is empirically evaluated on a large data set of the Czech National Mathematics exam.
arXiv Detail & Related papers (2020-09-15T10:55:41Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z) - On the Discrepancy between Density Estimation and Sequence Generation [92.70116082182076]
log-likelihood is highly correlated with BLEU when we consider models within the same family.
We observe no correlation between rankings of models across different families.
arXiv Detail & Related papers (2020-02-17T20:13:35Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.