Aligning the Pretraining and Finetuning Objectives of Language Models
- URL: http://arxiv.org/abs/2002.02000v1
- Date: Wed, 5 Feb 2020 21:40:50 GMT
- Title: Aligning the Pretraining and Finetuning Objectives of Language Models
- Authors: Nuo Wang Pierse, Jingwen Lu
- Abstract summary: We show that aligning the pretraining objectives to the finetuning objectives in language model training significantly improves the finetuning task performance.
We name finetuning small language models in the presence of hundreds of training examples or less "Few Example learning"
In practice, Few Example Learning enabled by objective alignment not only saves human labeling costs, but also makes it possible to leverage language models in more real-time applications.
- Score: 1.0965065178451106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We demonstrate that explicitly aligning the pretraining objectives to the
finetuning objectives in language model training significantly improves the
finetuning task performance and reduces the minimum amount of finetuning
examples required. The performance margin gained from objective alignment
allows us to build language models with smaller sizes for tasks with less
available training data. We provide empirical evidence of these claims by
applying objective alignment to concept-of-interest tagging and acronym
detection tasks. We found that, with objective alignment, our 768 by 3 and 512
by 3 transformer language models can reach accuracy of 83.9%/82.5% for
concept-of-interest tagging and 73.8%/70.2% for acronym detection using only
200 finetuning examples per task, outperforming the 768 by 3 model pretrained
without objective alignment by +4.8%/+3.4% and +9.9%/+6.3%. We name finetuning
small language models in the presence of hundreds of training examples or less
"Few Example learning". In practice, Few Example Learning enabled by objective
alignment not only saves human labeling costs, but also makes it possible to
leverage language models in more real-time applications.
Related papers
- Emergent Abilities in Reduced-Scale Generative Language Models [10.51168925267033]
Large language models can solve new tasks without task-specific fine-tuning.
This ability is considered an emergent ability and is primarily seen in large language models with billions of parameters.
This study investigates if such emergent properties are strictly tied to model size or can be demonstrated by smaller models trained on reduced-scale data.
arXiv Detail & Related papers (2024-04-02T18:00:28Z) - LIMA: Less Is More for Alignment [112.93890201395477]
We train LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses.
LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples.
In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases.
arXiv Detail & Related papers (2023-05-18T17:45:22Z) - Contrastive Alignment of Vision to Language Through Parameter-Efficient
Transfer Learning [60.26952378997713]
Contrastive vision-language models (e.g. CLIP) are created by updating all the parameters of a vision model and language model through contrastive training.
We show that a minimal set of parameter updates ($$7%) can achieve the same performance as full-model training.
We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training.
arXiv Detail & Related papers (2023-03-21T14:12:08Z) - Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language
Models [23.818751895205132]
Go-tuning is a geometry-guided self-supervised learning method.
Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B)
arXiv Detail & Related papers (2022-12-20T17:36:49Z) - Forging Multiple Training Objectives for Pre-trained Language Models via
Meta-Learning [97.28779163988833]
Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling.
We propose textitMOMETAS, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives.
arXiv Detail & Related papers (2022-10-19T04:38:26Z) - Look Ma, Only 400 Samples! Revisiting the Effectiveness of Automatic
N-Gram Rule Generation for Spelling Normalization in Filipino [0.0]
84.75 million Filipinos online, the ability for models to process online text is crucial for developing Filipino NLP applications.
We propose an N-Gram + Damerau Levenshtein distance model with automatic rule extraction.
arXiv Detail & Related papers (2022-10-06T04:41:26Z) - Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce
Data Annotation Required in Visual Commonsense Tasks [3.42658286826597]
We analyze different prompt-based fine-tuning techniques to improve results on both language and multimodal causal transformer models.
Our results show that by simple model-agnostic prompt-based fine-tuning, comparable results can be reached by only using 35%-40% of the fine-tuning training dataset.
arXiv Detail & Related papers (2022-04-25T18:56:55Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z) - Finetuned Language Models Are Zero-Shot Learners [67.70352207685558]
We show that instruction tuning boosts zero-shot performance on unseen tasks.
We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates.
We evaluate this instruction-tuned model, which we call FLAN, on unseen task types.
arXiv Detail & Related papers (2021-09-03T17:55:52Z) - ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language
Understanding and Generation [25.430130072811075]
We propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models.
It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks.
We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph.
arXiv Detail & Related papers (2021-07-05T16:54:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.