Cross-Lingual Text Classification with Multilingual Distillation and
Zero-Shot-Aware Training
- URL: http://arxiv.org/abs/2202.13654v1
- Date: Mon, 28 Feb 2022 09:51:32 GMT
- Title: Cross-Lingual Text Classification with Multilingual Distillation and
Zero-Shot-Aware Training
- Authors: Ziqing Yang, Yiming Cui, Zhigang Chen, Shijin Wang
- Abstract summary: Multi-branch multilingual language model (MBLM) built on Multilingual pre-trained language models (MPLMs)
Method based on transferring knowledge from high-performance monolingual models with a teacher-student framework.
Results on two cross-lingual classification tasks show that, with only the task's supervised data used, our method improves both the supervised and zero-shot performance of MPLMs.
- Score: 21.934439663979663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual pre-trained language models (MPLMs) not only can handle tasks in
different languages but also exhibit surprising zero-shot cross-lingual
transferability. However, MPLMs usually are not able to achieve comparable
supervised performance on rich-resource languages compared to the
state-of-the-art monolingual pre-trained models. In this paper, we aim to
improve the multilingual model's supervised and zero-shot performance
simultaneously only with the resources from supervised languages. Our approach
is based on transferring knowledge from high-performance monolingual models
with a teacher-student framework. We let the multilingual model learn from
multiple monolingual models simultaneously. To exploit the model's
cross-lingual transferability, we propose MBLM (multi-branch multilingual
language model), a model built on the MPLMs with multiple language branches.
Each branch is a stack of transformers. MBLM is trained with the
zero-shot-aware training strategy that encourages the model to learn from the
mixture of zero-shot representations from all the branches. The results on two
cross-lingual classification tasks show that, with only the task's supervised
data used, our method improves both the supervised and zero-shot performance of
MPLMs.
Related papers
- PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Distilling Efficient Language-Specific Models for Cross-Lingual Transfer [75.32131584449786]
Massively multilingual Transformers (MMTs) are widely used for cross-lingual transfer learning.
MMTs' language coverage makes them unnecessarily expensive to deploy in terms of model size, inference time, energy, and hardware cost.
We propose to extract compressed, language-specific models from MMTs which retain the capacity of the original MMTs for cross-lingual transfer.
arXiv Detail & Related papers (2023-06-02T17:31:52Z) - WeLM: A Well-Read Pre-trained Language Model for Chinese [37.68378062625651]
We present WeLM: a well-read pre-trained language model for Chinese.
We show that WeLM is equipped with broad knowledge on various domains and languages.
arXiv Detail & Related papers (2022-09-21T14:05:30Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - Adapting Monolingual Models: Data can be Scarce when Language Similarity
is High [3.249853429482705]
We investigate the performance of zero-shot transfer learning with as little data as possible.
We retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties.
With high language similarity, 10MB of data appears sufficient to achieve substantial monolingual transfer performance.
arXiv Detail & Related papers (2021-05-06T17:43:40Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.