Selecting Informative Contexts Improves Language Model Finetuning
- URL: http://arxiv.org/abs/2005.00175v3
- Date: Thu, 19 May 2022 22:49:00 GMT
- Title: Selecting Informative Contexts Improves Language Model Finetuning
- Authors: Richard Antonello, Nicole Beckage, Javier Turek, and Alexander Huth
- Abstract summary: We present a general fine-tuning method that we call information gain filtration.
During fine-tuning, a secondary learner selects informative examples and skips uninformative ones.
We show that our method has consistent improvement across datasets, fine-tuning tasks, and language model architectures.
- Score: 66.26521454263343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language model fine-tuning is essential for modern natural language
processing, but is computationally expensive and time-consuming. Further, the
effectiveness of fine-tuning is limited by the inclusion of training examples
that negatively affect performance. Here we present a general fine-tuning
method that we call information gain filtration for improving the overall
training efficiency and final performance of language model fine-tuning. We
define the information gain of an example as the improvement on a test metric
after training on that example. A secondary learner is then trained to
approximate this quantity. During fine-tuning, this learner selects informative
examples and skips uninformative ones. We show that our method has consistent
improvement across datasets, fine-tuning tasks, and language model
architectures. For example, we achieve a median perplexity of 54.0 on a books
dataset compared to 57.3 for standard fine-tuning. We present statistical
evidence that offers insight into the improvements of our method over standard
fine-tuning. The generality of our method leads us to propose a new paradigm
for language model fine-tuning -- we encourage researchers to release
pretrained secondary learners on common corpora to promote efficient and
effective fine-tuning, thereby improving the performance and reducing the
overall energy footprint of language model fine-tuning.
Related papers
- Transfer Learning for Finetuning Large Language Models [36.047470973893155]
We investigate transfer learning for finetuning large language models.
We learn finetuning by meta-learning performance and cost surrogate models for grey-box meta-optimization from a new meta-dataset.
Our results demonstrate the transferability of finetuning to adapt large language models more effectively.
arXiv Detail & Related papers (2024-11-02T09:43:12Z) - Ensembling Finetuned Language Models for Text Classification [55.15643209328513]
Finetuning is a common practice across different communities to adapt pretrained models to particular tasks.
ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates.
We present a metadataset with predictions from five large finetuned models on six datasets and report results of different ensembling strategies.
arXiv Detail & Related papers (2024-10-25T09:15:54Z) - Less for More: Enhancing Preference Learning in Generative Language Models with Automated Self-Curation of Training Corpora [4.008122785948581]
Ambiguity in language presents challenges in developing more enhanced language models.
We introduce a self-curation method that preprocesses annotated datasets by leveraging proxy models trained directly on these datasets.
Our method enhances preference learning by automatically detecting and removing ambiguous annotations within the dataset.
arXiv Detail & Related papers (2024-08-23T02:27:14Z) - Influence Scores at Scale for Efficient Language Data Sampling [3.072340427031969]
"influence scores" are used to identify important subsets of data.
In this paper, we explore the applicability of influence scores in language classification tasks.
arXiv Detail & Related papers (2023-11-27T20:19:22Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based
Masked Language-models [51.53936551681613]
We show that fine-tuning only the bias terms (or a subset of the bias terms) of pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model.
They support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge.
arXiv Detail & Related papers (2021-06-18T16:09:21Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - Fine-tuning BERT for Low-Resource Natural Language Understanding via
Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model.
Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model.
We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.