VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation
- URL: http://arxiv.org/abs/2010.16046v2
- Date: Wed, 2 Jun 2021 13:15:11 GMT
- Title: VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation
- Authors: Fuli Luo, Wei Wang, Jiahao Liu, Yijia Liu, Bin Bi, Songfang Huang, Fei
Huang, Luo Si
- Abstract summary: We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
- Score: 77.82373082024934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing work in multilingual pretraining has demonstrated the potential of
cross-lingual transferability by training a unified Transformer encoder for
multiple languages. However, much of this work only relies on the shared
vocabulary and bilingual contexts to encourage the correlation across
languages, which is loose and implicit for aligning the contextual
representations between languages. In this paper, we plug a cross-attention
module into the Transformer encoder to explicitly build the interdependence
between languages. It can effectively avoid the degeneration of predicting
masked words only conditioned on the context in its own language. More
importantly, when fine-tuning on downstream tasks, the cross-attention module
can be plugged in or out on-demand, thus naturally benefiting a wider range of
cross-lingual tasks, from language understanding to generation.
As a result, the proposed cross-lingual model delivers new state-of-the-art
results on various cross-lingual understanding tasks of the XTREME benchmark,
covering text classification, sequence labeling, question answering, and
sentence retrieval. For cross-lingual generation tasks, it also outperforms all
existing cross-lingual models and state-of-the-art Transformer variants on
WMT14 English-to-German and English-to-French translation datasets, with gains
of up to 1~2 BLEU.
Related papers
- Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - Syntax-augmented Multilingual BERT for Cross-lingual Transfer [37.99210035238424]
This work shows that explicitly providing language syntax and training mBERT helps cross-lingual transfer.
Experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks.
arXiv Detail & Related papers (2021-06-03T21:12:50Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.