Cross-Lingual Supervision improves Large Language Models Pre-training
- URL: http://arxiv.org/abs/2305.11778v1
- Date: Fri, 19 May 2023 16:14:07 GMT
- Title: Cross-Lingual Supervision improves Large Language Models Pre-training
- Authors: Andrea Schioppa and Xavier Garcia and Orhan Firat
- Abstract summary: We demonstrate that pre-training Large Language Models on a mixture of a self-supervised Language Modeling objective and the supervised Machine Translation objective, yields models with better in-context learning abilities.
As pre-training is a very resource-intensive process and a grid search on the best mixing ratio between the two objectives is prohibitively expensive, we propose a simple yet effective strategy to learn it during pre-training.
- Score: 36.932380291416365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent rapid progress in pre-training Large Language Models has relied on
using self-supervised language modeling objectives like next token prediction
or span corruption. On the other hand, Machine Translation Systems are mostly
trained using cross-lingual supervision that requires aligned data between
source and target languages. We demonstrate that pre-training Large Language
Models on a mixture of a self-supervised Language Modeling objective and the
supervised Machine Translation objective, therefore including cross-lingual
parallel data during pre-training, yields models with better in-context
learning abilities. As pre-training is a very resource-intensive process and a
grid search on the best mixing ratio between the two objectives is
prohibitively expensive, we propose a simple yet effective strategy to learn it
during pre-training.
Related papers
- A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives [13.581385765600265]
Pretrained language models (PLMs) display impressive performances and have captured the attention of the NLP community.
This paper proposes a comparison of multilingual pretraining objectives in a controlled methodological environment.
arXiv Detail & Related papers (2024-07-22T09:16:30Z) - Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin [3.2039731457723604]
We aim to improve upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus.
Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements.
arXiv Detail & Related papers (2023-07-01T16:47:36Z) - Efficient Language Model Training through Cross-Lingual and Progressive
Transfer Learning [0.7612676127275795]
Most Transformer language models are pretrained on English text.
As model sizes grow, the performance gap between English and other languages increases even further.
We introduce a cross-lingual and progressive transfer learning approach, called CLP-Transfer.
arXiv Detail & Related papers (2023-01-23T18:56:12Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.