Cross-lingual Transferring of Pre-trained Contextualized Language Models
- URL: http://arxiv.org/abs/2107.12627v1
- Date: Tue, 27 Jul 2021 06:51:13 GMT
- Title: Cross-lingual Transferring of Pre-trained Contextualized Language Models
- Authors: Zuchao Li, Kevin Parnow, Hai Zhao, Zhuosheng Zhang, Rui Wang, Masao
Utiyama, Eiichiro Sumita
- Abstract summary: We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
- Score: 73.97131976850424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though the pre-trained contextualized language model (PrLM) has made a
significant impact on NLP, training PrLMs in languages other than English can
be impractical for two reasons: other languages often lack corpora sufficient
for training powerful PrLMs, and because of the commonalities among human
languages, computationally expensive PrLM training for different languages is
somewhat redundant. In this work, building upon the recent works connecting
cross-lingual model transferring and neural machine translation, we thus
propose a novel cross-lingual model transferring framework for PrLMs: TreLM. To
handle the symbol order and sequence length differences between languages, we
propose an intermediate ``TRILayer" structure that learns from these
differences and creates a better transfer in our primary translation direction,
as well as a new cross-lingual language modeling objective for transfer
training. Additionally, we showcase an embedding aligning that adversarially
adapts a PrLM's non-contextualized embedding space and the TRILayer structure
to learn a text transformation network across languages, which addresses the
vocabulary difference between languages. Experiments on both language
understanding and structure parsing tasks show the proposed framework
significantly outperforms language models trained from scratch with limited
data in both performance and efficiency. Moreover, despite an insignificant
performance loss compared to pre-training from scratch in resource-rich
scenarios, our cross-lingual model transferring framework is significantly more
economical.
Related papers
- Self-Translate-Train: Enhancing Cross-Lingual Transfer of Large Language Models via Inherent Capability [31.025371443719404]
Self-Translate-Train is a method that lets large language models translate training data into the target language and fine-tunes the model on its own generated data.
By demonstrating that Self-Translate-Train outperforms zero-shot transfer, we encourage further exploration of better methods to elicit cross-lingual capabilities of LLMs.
arXiv Detail & Related papers (2024-06-29T14:40:23Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Lightweight Cross-Lingual Sentence Representation Learning [57.9365829513914]
We introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations.
We propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task.
arXiv Detail & Related papers (2021-05-28T14:10:48Z) - Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training [45.48003947488825]
We study two widely used robust training methods: adversarial training and randomized smoothing.
The experimental results demonstrate that robust training can improve zero-shot cross-lingual transfer for text classification.
arXiv Detail & Related papers (2021-04-17T21:21:53Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z) - Testing pre-trained Transformer models for Lithuanian news clustering [0.0]
Non-English languages could not leverage such new opportunities with the English text pre-trained models.
We compare pre-trained multilingual BERT, XLM-R, and older learned text representation methods as encodings for the task of Lithuanian news clustering.
Our results indicate that publicly available pre-trained multilingual Transformer models can be fine-tuned to surpass word vectors but still score much lower than specially trained doc2vec embeddings.
arXiv Detail & Related papers (2020-04-03T14:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.