DICT-MLM: Improved Multilingual Pre-Training using Bilingual
Dictionaries
- URL: http://arxiv.org/abs/2010.12566v1
- Date: Fri, 23 Oct 2020 17:53:11 GMT
- Title: DICT-MLM: Improved Multilingual Pre-Training using Bilingual
Dictionaries
- Authors: Aditi Chaudhary, Karthik Raman, Krishna Srinivasan, Jiecao Chen
- Abstract summary: Masked modeling (MLM) objective as key language learning objective.
DICT-MLM works by incentivizing the model to be able to predict not just the original masked word, but potentially any of its cross-lingual synonyms as well.
Our empirical analysis on multiple downstream tasks spanning 30+ languages, demonstrates the efficacy of the proposed approach.
- Score: 8.83363871195679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained multilingual language models such as mBERT have shown immense
gains for several natural language processing (NLP) tasks, especially in the
zero-shot cross-lingual setting. Most, if not all, of these pre-trained models
rely on the masked-language modeling (MLM) objective as the key language
learning objective. The principle behind these approaches is that predicting
the masked words with the help of the surrounding text helps learn potent
contextualized representations. Despite the strong representation learning
capability enabled by MLM, we demonstrate an inherent limitation of MLM for
multilingual representation learning. In particular, by requiring the model to
predict the language-specific token, the MLM objective disincentivizes learning
a language-agnostic representation -- which is a key goal of multilingual
pre-training. Therefore to encourage better cross-lingual representation
learning we propose the DICT-MLM method. DICT-MLM works by incentivizing the
model to be able to predict not just the original masked word, but potentially
any of its cross-lingual synonyms as well. Our empirical analysis on multiple
downstream tasks spanning 30+ languages, demonstrates the efficacy of the
proposed approach and its ability to learn better multilingual representations.
Related papers
- Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training [29.47243668154796]
BLOOMZMMS is a novel model that integrates a multilingual LLM with a multilingual speech encoder.
We demonstrate the transferability of linguistic knowledge from the text to the speech modality.
Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks.
arXiv Detail & Related papers (2024-04-16T21:45:59Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Unsupervised Improvement of Factual Knowledge in Language Models [4.5788796239850225]
Masked language modeling plays a key role in pretraining large language models.
We propose an approach for influencing pretraining in a way that can improve language model performance on a variety of knowledge-intensive tasks.
arXiv Detail & Related papers (2023-04-04T07:37:06Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z) - Learning Multilingual Representation for Natural Language Understanding
with Enhanced Cross-Lingual Supervision [42.724921817550516]
We propose a network named decomposed attention (DA) as a replacement of MA.
The DA consists of an intra-lingual attention (IA) and a cross-lingual attention (CA), which model intralingual and cross-lingual supervisions respectively.
Experiments on various cross-lingual natural language understanding tasks show that the proposed architecture and learning strategy significantly improve the model's cross-lingual transferability.
arXiv Detail & Related papers (2021-06-09T16:12:13Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.