Meet in the Middle: A New Pre-training Paradigm
- URL: http://arxiv.org/abs/2303.07295v1
- Date: Mon, 13 Mar 2023 17:17:11 GMT
- Title: Meet in the Middle: A New Pre-training Paradigm
- Authors: Anh Nguyen, Nikos Karampatziakis, Weizhu Chen
- Abstract summary: Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion.
We propose a new pre-training paradigm with techniques that jointly improve the training data efficiency.
We show the effectiveness of our pre-training paradigm with extensive experiments on both programming and natural language models.
- Score: 41.52858444519968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most language models (LMs) are trained and applied in an autoregressive
left-to-right fashion, assuming that the next token only depends on the
preceding ones. However, this assumption ignores the potential benefits of
using the full sequence information during training, and the possibility of
having context from both sides during inference. In this paper, we propose a
new pre-training paradigm with techniques that jointly improve the training
data efficiency and the capabilities of the LMs in the infilling task. The
first is a training objective that aligns the predictions of a left-to-right LM
with those of a right-to-left LM, trained on the same data but in reverse
order. The second is a bidirectional inference procedure that enables both LMs
to meet in the middle. We show the effectiveness of our pre-training paradigm
with extensive experiments on both programming and natural language models,
outperforming strong baselines.
Related papers
- Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs)
MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z) - Making Pre-trained Language Models both Task-solvers and
Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z) - Preference-grounded Token-level Guidance for Language Model Fine-tuning [105.88789610320426]
Aligning language models with preferences is an important problem in natural language generation.
For LM training, based on the amount of supervised data, we present two *minimalist* learning objectives that utilize the learned guidance.
In experiments, our method performs competitively on two distinct representative LM tasks.
arXiv Detail & Related papers (2023-06-01T07:00:07Z) - Pretraining Language Models with Human Preferences [21.724817280998696]
Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM.
Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences.
arXiv Detail & Related papers (2023-02-16T21:03:33Z) - Masked Autoencoders As The Unified Learners For Pre-Trained Sentence
Representation [77.47617360812023]
We extend the recently proposed MAE style pre-training strategy, RetroMAE, to support a wide variety of sentence representation tasks.
The first stage performs RetroMAE over generic corpora, like Wikipedia, BookCorpus, etc., from which the base model is learned.
The second stage takes place on domain-specific data, e.g., MS MARCO and NLI, where the base model is continuingly trained based on RetroMAE and contrastive learning.
arXiv Detail & Related papers (2022-07-30T14:34:55Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Bayesian Active Learning with Pretrained Language Models [9.161353418331245]
Active Learning (AL) is a method to iteratively select data for annotation from a pool of unlabeled data.
Previous AL approaches have been limited to task-specific models that are trained from scratch at each iteration.
We introduce BALM; Bayesian Active Learning with pretrained language models.
arXiv Detail & Related papers (2021-04-16T19:07:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.