DICT-MLM: Improved Multilingual Pre-Training using Bilingual
Dictionaries
- URL: http://arxiv.org/abs/2010.12566v1
- Date: Fri, 23 Oct 2020 17:53:11 GMT
- Title: DICT-MLM: Improved Multilingual Pre-Training using Bilingual
Dictionaries
- Authors: Aditi Chaudhary, Karthik Raman, Krishna Srinivasan, Jiecao Chen
- Abstract summary: Masked modeling (MLM) objective as key language learning objective.
DICT-MLM works by incentivizing the model to be able to predict not just the original masked word, but potentially any of its cross-lingual synonyms as well.
Our empirical analysis on multiple downstream tasks spanning 30+ languages, demonstrates the efficacy of the proposed approach.
- Score: 8.83363871195679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained multilingual language models such as mBERT have shown immense
gains for several natural language processing (NLP) tasks, especially in the
zero-shot cross-lingual setting. Most, if not all, of these pre-trained models
rely on the masked-language modeling (MLM) objective as the key language
learning objective. The principle behind these approaches is that predicting
the masked words with the help of the surrounding text helps learn potent
contextualized representations. Despite the strong representation learning
capability enabled by MLM, we demonstrate an inherent limitation of MLM for
multilingual representation learning. In particular, by requiring the model to
predict the language-specific token, the MLM objective disincentivizes learning
a language-agnostic representation -- which is a key goal of multilingual
pre-training. Therefore to encourage better cross-lingual representation
learning we propose the DICT-MLM method. DICT-MLM works by incentivizing the
model to be able to predict not just the original masked word, but potentially
any of its cross-lingual synonyms as well. Our empirical analysis on multiple
downstream tasks spanning 30+ languages, demonstrates the efficacy of the
proposed approach and its ability to learn better multilingual representations.
Related papers
- Linguistic Entity Masking to Improve Cross-Lingual Representation of Multilingual Language Models for Low-Resource Languages [1.131401554081614]
We introduce a novel masking strategy, Linguistic Entity Masking (LEM) to be used in the continual pre-training step.
LEM limits masking to the linguistic entity types verbs, nouns and named entities, which hold a higher prominence in a sentence.
We evaluate the effectiveness of LEM using three downstream tasks, namely bitext mining, parallel data curation and code-mixed sentiment analysis.
arXiv Detail & Related papers (2025-01-10T04:17:58Z) - How Do Multilingual Language Models Remember Facts? [50.13632788453612]
We show that previously identified recall mechanisms in English largely apply to multilingual contexts.
We localize the role of language during recall, finding that subject enrichment is language-independent.
In decoder-only LLMs, FVs compose these two pieces of information in two separate stages.
arXiv Detail & Related papers (2024-10-18T11:39:34Z) - Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training [29.47243668154796]
BLOOMZMMS is a novel model that integrates a multilingual LLM with a multilingual speech encoder.
We demonstrate the transferability of linguistic knowledge from the text to the speech modality.
Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks.
arXiv Detail & Related papers (2024-04-16T21:45:59Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Unsupervised Improvement of Factual Knowledge in Language Models [4.5788796239850225]
Masked language modeling plays a key role in pretraining large language models.
We propose an approach for influencing pretraining in a way that can improve language model performance on a variety of knowledge-intensive tasks.
arXiv Detail & Related papers (2023-04-04T07:37:06Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z) - Learning Multilingual Representation for Natural Language Understanding
with Enhanced Cross-Lingual Supervision [42.724921817550516]
We propose a network named decomposed attention (DA) as a replacement of MA.
The DA consists of an intra-lingual attention (IA) and a cross-lingual attention (CA), which model intralingual and cross-lingual supervisions respectively.
Experiments on various cross-lingual natural language understanding tasks show that the proposed architecture and learning strategy significantly improve the model's cross-lingual transferability.
arXiv Detail & Related papers (2021-06-09T16:12:13Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.