Predicting Fine-Tuning Performance with Probing
- URL: http://arxiv.org/abs/2210.07352v1
- Date: Thu, 13 Oct 2022 20:58:14 GMT
- Title: Predicting Fine-Tuning Performance with Probing
- Authors: Zining Zhu, Soroosh Shahtalebi, Frank Rudzicz
- Abstract summary: This paper explores the utility of probing deep NLP models to extract a proxy signal widely used in model development.
We find that it is possible to use the accuracies of only three probing tests to predict the fine-tuning performance with errors $40%$ - $80%$ smaller than baselines.
- Score: 18.129450295108423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large NLP models have recently shown impressive performance in language
understanding tasks, typically evaluated by their fine-tuned performance.
Alternatively, probing has received increasing attention as being a lightweight
method for interpreting the intrinsic mechanisms of large NLP models. In
probing, post-hoc classifiers are trained on "out-of-domain" datasets that
diagnose specific abilities. While probing the language models has led to
insightful findings, they appear disjointed from the development of models.
This paper explores the utility of probing deep NLP models to extract a proxy
signal widely used in model development -- the fine-tuning performance. We find
that it is possible to use the accuracies of only three probing tests to
predict the fine-tuning performance with errors $40\%$ - $80\%$ smaller than
baselines. We further discuss possible avenues where probing can empower the
development of deep NLP models.
Related papers
- Large Language Models as Annotators: Enhancing Generalization of NLP
Models at Minimal Cost [6.662800021628275]
We study the use of large language models (LLMs) for annotating inputs and improving the generalization of NLP models.
We propose a sampling strategy based on the difference in prediction scores between the base model and the finetuned NLP model.
arXiv Detail & Related papers (2023-06-27T19:29:55Z) - SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations [17.972111965568384]
Fine-tuning pre-trained language models (PLMs) in conjunction with prompt-based learning has recently shown promising results.
We propose SparseFit, a few-shot fine-tuning strategy that leverages discrete prompts to jointly generate predictions and NLEs.
We find that fine-tuning only 6.8% of the model parameters leads to competitive results for both the task performance and the quality of the generated NLEs.
arXiv Detail & Related papers (2023-05-22T17:06:41Z) - Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-shot
In-Context Learners [25.262774179224945]
This paper proposes prompt-augmented linear probing (PALP), a hybrid of linear probing and in-context learning (ICL)
PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead.
arXiv Detail & Related papers (2022-12-21T09:37:05Z) - A Kernel-Based View of Language Model Fine-Tuning [94.75146965041131]
We investigate whether the Neural Tangent Kernel (NTK) describes fine-tuning of pre-trained LMs.
We show that formulating the downstream task as a masked word prediction problem through prompting often induces kernel-based dynamics during fine-tuning.
arXiv Detail & Related papers (2022-10-11T17:34:32Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Evaluating the Robustness of Neural Language Models to Input
Perturbations [7.064032374579076]
In this study, we design and implement various types of character-level and word-level perturbation methods to simulate noisy input texts.
We investigate the ability of high-performance language models such as BERT, XLNet, RoBERTa, and ELMo in handling different types of input perturbations.
The results suggest that language models are sensitive to input perturbations and their performance can decrease even when small changes are introduced.
arXiv Detail & Related papers (2021-08-27T12:31:17Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.